Dutch Web Corpus

This corpus was created within the Corpus Factory project as…

Indonesian WaC

The corpus is prepared by Corpus factory method described here.…

Croatian Web Corpus

(version 1.1) Tagset ​MULTEXT-East Morphosyntactic Specifications,…

TatarWaC corpus

Tatar sample corpus is ca 200 thousand words crawled from the…

Russian Web Corpus

This corpus was gathered by Serge Sharoff at the University of…

pukWaC

The same as ukWaC, but with a further layer of annotation added,…

Romanian WaC (RoWaC) corpus

This Romanian web as corpus was gathered by Monica Macoveiciuc,…

Polish Web Corpus (PolishWaC)

Polish web as corpus has 103 million words and the encoding is…

Internet-ZH corpus

Internet-ZH is a Chinese web corpus collected by Serge Sharoff.…

A medical web corpus

A web medical corpus has been collected using the WebBootCat with…