A page relevant to corpora.

Pages

SDeWaC corpus

SDeWaC is a subset of DeWaC. The creation of sDeWaC is described…

WelshWaC corpus

The corpus is prepared by Corpus factory method by Anil in October…

ThaiWaC corpus

The corpus is prepared by Corpus factory method. Full details…

UKWaCsst corpus

UKWaC tagged with SuperSenseTagger (​sst-light) described in…

Gujarati web corpus (guWaC)

GuWac web as corpus is a corpus of Gujarati language (Indo-Aryan…

Patakis corpus

Patakis is a 100 million word collection of POS-tagged texts…

Georgian Web 2013 (kaWaC) corpus

Original file owner: bharat.

FinnishWaC corpus

Finnish web as corpus.

danishWaC corpus

The corpus prepared by Corpus factory method. It has 288 million…