A page relevant to corpora.

Pages

Basque Web Corpus (WaC)

The Basque "Web as Corpus" corpus was created by Mr. Igor Leturia…

Persian Web Corpus (WaC)

Persian (also known as Farsi) is the main language of Iran. This…

Argamon corpus

The current Argamon corpus contains blog posts to various Farsi…

ACL Anthology Reference Corpus (ARC)

The corpus is prepared by Steven Bird. The process is described…

Algemeen Nederlands Woordenboek (ANW) corpus

The Algemeen Nederlands Woordenboek (ANW) corpus is a balanced…

New Model Corpus

The New model Corpus is a ~100 million words domain corpus built…

UKWaC corpus

The corpus was prepared by Adriano Ferraresi. The process is…

London English corpora

The corpus consists of transcripts of informal conversation-like…

zhTenTen corpus

Simplified Chinese TenTen corpus was created from the Internet…

yoTenTen corpus

Yoruba TenTen web corpus. The corpus is cleaned by ​jusText,…

uaTenTen corpus

Ukrainian TenTen corpus was crawled by ​SpiderLing in 2014.…

trTenTen corpus

Turkish TenTen corpus. Crawled by ​SpiderLing in December 2011…

svTenTen corpus

Swedish TenTen web corpus. The corpus is cleaned by ​jusText,…

skTenTen corpus

Slovak TenTen corpus. The corpus has been tagged by the ​Ľ.…

ruTenTen corpus

Russian TenTen corpus. Russian web corpus crawled by ​SpiderLing…