A page relevant to corpora.

Pages

Filipino web corpus (FilipinoWaC)

The corpus was created by Anil in October 2013. It has almost…

German Web Corpus (DeWaC)

The corpus was prepared by Marco Baroni in a web crawl as described…

Arabic web corpus (WaC)

Arabic web corpus was created by Serge Sharoff and was tagged…

Nineteenthcentury corpus

Actually, the 19th century corpus is only available to Osnabrück…

Bosnian/Croatian/Serbian WaC

Bosnian, Croatian, Serbian corpora obtained from the web by Nikola…

BengaliWaC corpus

Bengali web corpus was created with Corpus Factory method. The…

Cantonese web corpus (WaC)

This corpus is collected using Cantonese only seed words and…

Penn Historical Corpora

PennHistEn is a collection of historical English texts ranging…

GerManC. A Historical Corpus of German Newspapers 1650–1800

GerManC is a historical corpus of written German texts. (This…

A Corpus of English Dialogues 1560–1760

‘Released in Spring 2006, A Corpus of English Dialogues 1560–1760…

COMPAS corpus

The COMPAS is a corpus with about 100 million words, which was…

Corpus of Academic Journal Articles (CAJA)

This balanced corpus (in abbreviation CAJA) of academic language…

BulgarianNC corpus

Bulgarian National Corpus (see the website of Institute for Bulgarian…

BROWN Corpus

A Standard Corpus of Present-Day Edited American English, for…

Corpus Brasileiro

The ​Corpus Brasileiro (CB) is the result of a project funded…