Pages

Dutch Web Corpus

This corpus was created within the Corpus Factory project as…

Croatian Web Corpus

(version 1.1) Tagset ​MULTEXT-East Morphosyntactic Specifications,…

pukWaC

The same as ukWaC, but with a further layer of annotation added,…

Romanian WaC (RoWaC) corpus

This Romanian web as corpus was gathered by Monica Macoveiciuc,…

Internet-ZH corpus

Internet-ZH is a Chinese web corpus collected by Serge Sharoff.…

French Web Corpus (WaC)

This corpus (web as corpus) was gathered using a list of URLs…

ChineseTaiwanWaC corpus

Chinese Taiwan web as corpus has almost 260 million words encoded…