BulgarianNC: Bulgarian National Corpus
The Bulgarian National Corpus (BulgarianNC) is a Bulgarian corpus made up of texts collected from various sources – scanned books, transcribed data, internet texts, etc. The corpus is classified according to genre, domain, source type. It consists of 419 million words in total (both – web and nonweb part).
In Sketch Engine, BulgarianNC is organised hierarchically as follows:
- BulgarianNC_web: The web corpus from Bulgarian NC
- BulgarianNC_nonweb: All except the web
- BulgarianNC_all: BulgarianNC_web + BulgarianNC_nonweb –> This is a test case of our new feature – the Virtual Corpus or the Super Corpus
The Bulgarian National Corpus is PoS tagged using the following Bulgarian tagset.