SoNaR corpus

The SoNaR corpus is a 500-million-word reference corpus of contemporary…

Dutch Web Corpus

This corpus was created within the Corpus Factory project as…

Lithuanian WaC

(version 2) This corpus was created Corpus Factory method…

Indonesian WaC

The corpus is prepared by Corpus factory method described here.…

Croatian Web Corpus

(version 1.1) Tagset ​MULTEXT-East Morphosyntactic Specifications,…

Kannada WaC

Kannada WaC (web as corpus). The corpus is prepared by Corpus…

Yoruba WaC corpus

Yoruba web as corpus. It was compiled in June 2015 with encoding…

Shallow tagging

Shallow tagging is used for languages which we cannot tag with…

Chinese Tagset

A preview of a Chinese tagset. 普通名词 n common…