Estonian web corpus was crawled by SpiderLing in 2013. It was encoded in UTF-8, cleaned and deduplicated. You learn more about corpus and its tagset in documentation (available here ).

The current version of the corpus has 330 million tokens.

v. 1.0

  • the initial version obtained from the web in 2013