A page relevant to corpora.

Pages

SoNaR corpus

The SoNaR corpus is a 500-million-word reference corpus of contemporary…

Dutch Web Corpus

This corpus was created within the Corpus Factory project as…

Lithuanian WaC

(version 2) This corpus was created Corpus Factory method…

Indonesian WaC

The corpus is prepared by Corpus factory method described here.…

Croatian Web Corpus

(version 1.1) Tagset ​MULTEXT-East Morphosyntactic Specifications,…

Kannada WaC

Kannada WaC (web as corpus). The corpus is prepared by Corpus…

Yoruba WaC corpus

Yoruba web as corpus. It was compiled in June 2015 with encoding…

Hebrew web corpora

Hebrew General corpus This corpus was crawled from the Internet…

The New Corpus for Ireland | Nua-Chorpas na hÉireann

[ezcol_1half] The New Corpus for Ireland – user’s guide Welcome…

BBC Oxford Children's Stories

(Restricted corpus.) Bibliography Banerji, N., Gupta,…

Oxford Children's Corpus

Journal article Kate Wild, Adam Kilgarriff, and David Tugwell.…

Icelandic sample corpus

This is a small corpus of Icelandic texts prepared for the Sketch…

Hebrew Translational Corpus

Also referred to as "Hebrew Comparable Corpus", uploaded in 2010. The…

czes corpus

CZES is a Czech corpus consisting of newspaper articles and magazine…

Varieties of Learner English (VOLE) corpus

VOLE (Varieties of Learner English) is a corpus gathered in 2010…