SiBol/Port corpus

The SiBol/Port (Siena-Bologna, Portsmouth) corpus is a corpus…

Scottish Gaelic Wiki corpus

Scottish Gaelic Wikipedia corpus. Downloaded in February 2015.…

Russian Web Corpus

This corpus was gathered by Serge Sharoff at the University of…

pukWaC

The same as ukWaC, but with a further layer of annotation added,…

Romanian WaC (RoWaC) corpus

This Romanian web as corpus was gathered by Monica Macoveiciuc,…

Polish Web Corpus (PolishWaC)

Polish web as corpus has 103 million words and the encoding is…

Parallel Corpora Registry Info

General Attribute Set ATTRIBUTE word STRUCTURE s{ ATTRIBUTE…

PICAE: Pearson International Corpus of Academic English

This corpus was created by Kirsten Ackermann and David Tugwell,…

Islam – UK

A special English newspaper corpus by Costas Gabrielatos at…

Internet-ZH corpus

Internet-ZH is a Chinese web corpus collected by Serge Sharoff.…

Project Gutenberg Corpus

downloaded with wget: getting Gutenberg cleaned with…