A page relevant to corpora.

Pages

Internet-ZH corpus

Internet-ZH is a Chinese web corpus collected by Serge Sharoff.…

Fryske Akademy Parallel Corpus

Frisian and Dutch not POS tagged aligned sentences Dutch…

French Web Corpus (WaC)

This corpus (web as corpus) was gathered using a list of URLs…

Fida PLUS corpus

The corpus is a reference corpus for Slovene, as described ​here.…

Estonian Reference Corpus

Estonian Reference Corpus is a morphologically annotated corpus…

DCEP: Digital Corpus of the European Parliament

The Digital Corpus of the European Parliament (DCEP) is a collection…

ChineseWiki corpus

The Chinese Wiki corpus is first segmented with Stanford Word…

ChineseTaiwanWaC corpus

Chinese Taiwan web as corpus has almost 260 million words encoded…

Chinese Gigaword corpus

The Chinese Gigaword corpus from the Linguistic Data Consortium…

CHILDES English corpus

Childes-En is a subcorpus of the full CHILDES corpus which has…

HindiWaC corpus

This corpus contains almost 60 million words crawled from the…

IgboWaC corpus

The corpus is prepared by Corpus factory method and was crawled…