The corpus was crawled by SpiderLing in 2011, encoded in UTF-8, cleaned and deduplicated. Tagged by RFTagger + TreeTagger. The corpus is cleared of obscene language, using a list of word, prohibited for naming in “.рф” domain space. The size of unzipped corpus is approximately 52 GB. It consists of 983,255,513 tokens or 10,394,826 unique lemmas.