Polish web as corpus has 103 million words and the encoding is in UTF-8. The corpus is tagged.

Changelog

v2.0 (25 May 2011)

  • Fixed document metadata. Before, the same metadata was displayed for the whole corpus.