Estonian web corpus was crawled by SpiderLing in 2013. It was encoded in UTF-8, cleaned and deduplicated. You learn more about corpus and its tagset in documentation (available here​).

Current version of the corpus has 330 million tokens.

v. 1.0

  • initial version, obtained from the web in 2013