A page relevant to corpora.

Pages

Islam – UK

A special English newspaper corpus by Costas Gabrielatos at…

Internet-ZH corpus

Internet-ZH is a Chinese web corpus collected by Serge Sharoff.…

Fryske Akademy Parallel Corpus

Frisian and Dutch not POS tagged aligned sentences Dutch…

French Web Corpus (WaC)

This corpus (web as corpus) was gathered using a list of URLs…

Fida PLUS corpus

The corpus is a reference corpus for Slovene, as described ​here.…

Feed Corpus

The FeedCorpus is a corpus with about 300 million words, which…

Europarl: European Parliament Proceedings Parallel Corpus

The corpus was prepared by Philipp Koehn. The process is described…

Estonian Reference Corpus

Morphologically annotated corpus by Filosoft containing written…

English Wikipedia corpus

This corpus has been built using English Wikipedia dump (from…

DCEP: Digital Corpus of the European Parliament

The Digital Corpus of the European Parliament (DCEP) is a collection…

ChineseWiki corpus

The Chinese Wiki corpus is first segmented with Stanford Word…

ChineseTaiwanWaC corpus

Chinese Taiwan web as corpus has almost 260 million words encoded…

Chinese Gigaword corpus

The Chinese Gigaword corpus from the Linguistic Data Consortium…

CHILDES English corpus

Childes-En is a subcorpus of the full CHILDES corpus which has…

CAJA corpus

Caja corpus is a corpus of Academic Journal Aricles. created…