Pages

Dutch Web Corpus

This corpus was created within the Corpus Factory project as…

Lithuanian WaC

(version 2) This corpus was created Corpus Factory method…

Indonesian WaC

The corpus is prepared by Corpus factory method described here.…

Croatian Web Corpus

(version 1.1) Tagset ​MULTEXT-East Morphosyntactic Specifications,…

Kannada WaC

Kannada WaC (web as corpus). The corpus is prepared by Corpus…

Yoruba WaC corpus

Yoruba web as corpus. It was compiled in June 2015 with encoding…

Urdu

The web corpus containing 53 million words built with Corpus…

pukWaC

The same as ukWaC, but with a further layer of annotation added,…

Romanian WaC (RoWaC) corpus

This Romanian web as corpus was gathered by Monica Macoveiciuc,…

Internet-ZH corpus

Internet-ZH is a Chinese web corpus collected by Serge Sharoff.…

French Web Corpus (WaC)

This corpus (web as corpus) was gathered using a list of URLs…

Domain Web Corpus

The corpora available here have been collected using the WebBootCat…

ChineseTaiwanWaC corpus

Chinese Taiwan web as corpus has almost 260 million words encoded…

HindiWaC corpus

This corpus contains almost 60 million words crawled from the…

IgboWaC corpus

The corpus is prepared by Corpus factory method and was crawled…