A page relevant to corpora.

Pages

ThaiWaC corpus

The corpus is prepared by Corpus factory method. Full details…

TurkishWaC corpus

The TurkishWaC corpus is a 32 million word collection of samples…

UKWaCsst corpus

UKWaC tagged with SuperSenseTagger (​sst-light) described in…

GujarathiWaC corpus

FrWac web as corpus is a corpus of Gujarati language (Indo-Aryan…

Patakis corpus

Patakis is a 100 million word collection of POS-tagged texts…

GkWaC corpus

Greek web as corpus is a 100 million word collection of POS-tagged…

GeorgianWaC corpus

Original file owner: bharat.

FrWaC corpus

FrWac web as corpus crawled to the .fr domain and tagged with…

FinnishWaC corpus

Finnish web as corpus.

FrisianWaC corpus

Frisian web as corpus was crawled in August 2013. It is a corpus…

danishWaC corpus

The corpus prepared by Corpus factory method. It has 288 million…

ScienceBlog corpus

The ScienceBlog corpus is a selection of posts and comments from…

e-flux corpus

The e-flux corpus is a web corpus of English art news digests.…

Corpus TECU – Geodetics web corpus

(information in Czech language) Tvorba specializovaných dat…

Environment corpus

English environment related web corpus. Crawled by SpiderLing…