Bulgarian TenTen corpus crawled by SpiderLing in November 2012. It was encoded in UTF-8, cleaned and deduplicated including removal all data from BulgarianNC2. This corpus is not tagged yet.
Current number of tokens is almost 850 million.
initial version, obtained from the web in 2012
Adam Kilgarriff Prize
Adam Kilgariff (1960-2015) was a British corpus linguist and founder of Lexical Computing, the company behind Sketch Engine. Adam devoted his whole life to research at the intersection of corpus linguistic, computational linguistics and lexicography.
To honour our brilliant and much-loved colleague, we established the Adam Kilgarriff Prize for outstanding work in the fields to which Adam contributed so much: corpus linguistics, computational linguistics, and lexicography.