The Oxford English Corpus

The Oxford English Corpus (OEC) consisted mainly of websites…

Islam – UK

A special English newspaper corpus by Costas Gabrielatos at…

Internet-ZH corpus

Internet-ZH is a Chinese web corpus collected by Serge Sharoff.…

Project Gutenberg Corpus

downloaded with wget: ​getting Gutenberg cleaned with…

Fryske Akademy Parallel Corpus

Frisian and Dutch not POS tagged aligned sentences Dutch…

French Web Corpus (WaC)

This corpus (web as corpus) was gathered using a list of URLs…

Fida PLUS corpus

The corpus is a reference corpus for Slovene, as described ​here.…

Feed Corpus

The FeedCorpus is a corpus with about 300 million words, which…

Europarl: European Parliament Proceedings Parallel Corpus

The corpus was prepared by Philipp Koehn. The process is described…

Estonian Reference Corpus

Morphologically annotated corpus by Filosoft containing written…

English Wikipedia corpus

This corpus has been built using English Wikipedia dump (from…

Domain Web Corpus

The corpora available here have been collected using the WebBootCat…

DGT-Translation Memory

This translation memory consists of 24 collections of texts in…

DCEP: Digital Corpus of the European Parliament

The Digital Corpus of the European Parliament (DCEP) is a collection…

ChineseTaiwanWaC corpus

Chinese Taiwan web as corpus has almost 260 million words encoded…