A page relevant to corpora.

Pages

Nepali National Corpus

It is 13 million word corpus of  Nepali. The corpus consists…

Islam – UK

A special English newspaper corpus by Costas Gabrielatos at…

Internet-ZH corpus

Internet-ZH is a Chinese web corpus collected by Serge Sharoff.…

Fryske Akademy Parallel Corpus

Frisian and Dutch not POS tagged aligned sentences Dutch…

French Web Corpus (WaC)

This corpus (web as corpus) was gathered using a list of URLs…

Fida PLUS corpus

The corpus is a reference corpus for Slovene, as described ​here.…

Feed Corpus

The FeedCorpus is a corpus with about 300 million words, which…

Europarl: European Parliament Proceedings Parallel Corpus

The corpus was prepared by Philipp Koehn. The process is described…

Estonian Reference Corpus

Morphologically annotated corpus by Filosoft. The character…

English Wikipedia corpus

This corpus has been built using English Wikipedia dump (from…

DCEP: Digital Corpus of the European Parliament

The Digital Corpus of the European Parliament (DCEP) is a collection…

ChineseWiki corpus

The Chinese Wiki corpus is first segmented with Stanford Word…

ChineseTaiwanWaC corpus

Chinese Taiwan web as corpus has almost 260 million words encoded…

Chinese Gigaword traditional

The corpus is part of the ​Chinese Gigaword corpus from the…

Chinese Gigaword simplified corpus

The corpus is part of the ​Chinese Gigaword corpus from the…