Polish web as corpus has 103 million words and the encoding is in UTF-8. Corpus based on queries to Google in these years with the most frequent Polish words. Tagged by Morfeusz a TaKIPI.

Changelog

v2.0 (25 May 2011)

  • Fixed document metadata. Before, the same metadata was displayed for the whole corpus.