SoNaR corpus

The SoNaR corpus is a 500-million-word reference corpus of contemporary…

Dutch Web Corpus

This corpus was created within the Corpus Factory project as…

Lithuanian WaC

(version 2) This corpus was created Corpus Factory method…

Indonesian WaC

The corpus is prepared by Corpus factory method described here.…

Hebrew Translational Corpus tagset

 TAGGER OUTPUT  ABRV  VALUES PER TAG token token transliteration…

Croatian Web Corpus

(version 1.1) Tagset ​MULTEXT-East Morphosyntactic Specifications,…

Kannada WaC

Kannada WaC (web as corpus). The corpus is prepared by Corpus…

Yoruba WaC corpus

Yoruba web as corpus. It was compiled in June 2015 with encoding…

Shallow tagging

Shallow tagging is used for languages which we cannot tag with…

Chinese Tagset

A preview of a Chinese tagset. 普通名词 n common…

Penn Treebank Tagset (wrong)

POS Tag Description Example CC coordinating conjunction and CD cardinal…

Vietnamese Tagset

(This list was copied from the official readme file at https://github.com/hakz/vntagger-gate-plugin.vntagger/blob/master/README.txt) The…

Romanian Tagset | Clasa de etichete pentru Limba Română

For each tag, the first character specifies the major word…

Tagset for Japanese SkE– English Translations of Chasen POS Tags

The Japanese web corpus, JpWaC, is annotated using the English…