Get notified by email. Subscribe to news

Gujarati web corpus (guWaC)

GuWac web as corpus is a corpus of Gujarati language (Indo-Aryan…

Patakis corpus

Patakis is a 100 million word collection of POS-tagged texts…

Georgian Web 2013 (kaWaC) corpus

Original file owner: bharat.

FinnishWaC corpus

Finnish web as corpus.

danishWaC corpus

The corpus prepared by Corpus factory method. It has 288 million…

Domain Specific Corpora

These corpora are prepared from specific domains, e.g. science,…

e-flux corpus

The e-flux corpus is a web corpus of English art news digests.…

Nineteenthcentury corpus

Actually, the 19th century corpus is only available to Osnabrück…