A page relevant to corpora.

Pages

TED_en corpus

A corpus of transcripts of TED talks. Prepared by Akshay Min…

jpTenTen11 LUW corpus

Japanese TenTen corpus gathered from the web in December 2011.…

SiBol/Port corpus

The SiBol/Port (Siena-Bologna, Portsmouth) corpus is a corpus…

Scottish Gaelic Wiki corpus

Scottish Gaelic Wikipedia corpus. Downloaded in February 2015.…

Russian Web Corpus

This corpus was gathered by Serge Sharoff at the University of…

pukWaC

The same as ukWaC, but with a further layer of annotation added,…

Romanian WaC (RoWaC) corpus

This Romanian web as corpus was gathered by Monica Macoveiciuc,…

Polish Web Corpus (PolishWaC)

Polish web as corpus has 103 million words and the encoding is…

Islam – UK

A special English newspaper corpus by Costas Gabrielatos at…