A page relevant to corpora.

Pages

Urdu

The web corpus containing 53 million words built with Corpus…

TED_en corpus

A corpus of transcripts of TED talks. Prepared by Akshay Min…

jpTenTen11 LUW corpus

Japanese TenTen corpus gathered from the web in December 2011.…

SiBol/Port corpus

The SiBol/Port (Siena-Bologna, Portsmouth) corpus is a corpus…

SemCor (Sense-tagged corpus)

Semantic corpus from Brown, built by Siva. MWE marked.

Scottish Gaelic Wiki corpus

Scottish Gaelic Wikipedia corpus. Downloaded in February 2015.…

Russian Web Corpus

This corpus was gathered by Serge Sharoff at the University of…

pukWaC

The same as ukWaC, but with a further layer of annotation added,…

Romanian WaC (RoWaC) corpus

This Romanian web as corpus was gathered by Monica Macoveiciuc,…

Quran Annotated Corpus

The version of the Quran was prepared by Zainab Alqassem (Alqassem…

Portuguese corpus

The CetemPúblico/CetenFolha Portuguese corpus installed here…

Polish Web Corpus (PolishWaC)

Polish web as corpus has 103 million words and the encoding is…

PICAE: Pearson International Corpus of Academic English

This corpus was created by Kirsten Ackermann and David Tugwell,…

The Oxford English Corpus

The Oxford English Corpus (OEC) consisted mainly of websites…

Nepali National Corpus

It is 13 million word corpus of  Nepali. The corpus consists…