Ukrainian TenTen corpus was crawled by SpiderLing in 2014. It was encoded in UTF-8, cleaned and deduplicated. This corpus is not tagged yet.

The current version of the corpus has approximately 2.7 billion tokens.

v. 1.0

  • initial version, obtained from the web in 2014