You are here:Home/Bulgarian Web corpus (bgTenTen corpus)
bgTenTen: Corpus of the Bulgarian Web
The Bulgarian Web Corpus (bgTenTen) is a language corpus made up of texts collected from the Internet. The corpus belongs to the TenTen corpus family which is a set of the web corpora built using the same method with a target size 10+ billion words. Sketch Engine currently provides access to TenTen corpora in more than 30 languages.