A page relevant to corpora.

Pages

Bosnian/Croatian/Serbian WaC

Bosnian, Croatian, Serbian corpora obtained from the web by Nikola…

Penn Historical Corpora

Penn Historical Corpora is a collection of historical English…

A Corpus of English Dialogues 1560–1760

‘Released in Spring 2006, A Corpus of English Dialogues 1560–1760…

COMPAS corpus

The COMPAS is a corpus with about 100 million words which was…

BulgarianNC corpus

Bulgarian National Corpus (see the website of Institute for Bulgarian…

Basque Web Corpus (WaC)

The Basque "Web as Corpus" corpus was created by Mr. Igor Leturia…

Persian Web Corpus (WaC)

Persian (also known as Farsi) is the main language of Iran. This…

Argamon corpus

The current Argamon corpus contains blog posts to various Farsi…

ACL Anthology Reference Corpus (ARC)

The corpus is prepared by Steven Bird. The process is described…