Patakis corpus

Patakis is a 100 million word collection of POS-tagged texts…

GeorgianWaC corpus

Original file owner: bharat.

FinnishWaC corpus

Finnish web as corpus.

FrisianWaC corpus

Frisian web as corpus was crawled in August 2013. It is a corpus…

danishWaC corpus

The corpus prepared by Corpus factory method. It has 288 million…

Domain Specific Corpora

These corpora are prepared from specific domains, e.g. science,…

ScienceBlog corpus

The ScienceBlogs corpus is a selection of posts and comments…

e-flux corpus

The e-flux corpus is a web corpus of English art news digests.…

Environment corpus

English environment related web corpus. Crawled by SpiderLing…

Filipino web corpus (FilipinoWaC)

The corpus was created by Anil in October 2013. It has almost…

Arabic web corpus (WaC)

Arabic web corpus was created by Serge Sharoff and was tagged…

Nineteenthcentury corpus

Actually, the 19th century corpus is only available to Osnabrück…