Morphologically annotated corpus by Filosoft containing written texts. The character encoding is UTF8. See the Estonian Reference corpus documentation.

The version EstonianNC consists of written texts of Estonian Reference corpus and web texts of Estonian Web 2013.

Part of speech abbreviations:

A = Adjective (positive)
C = Adjective (comparative)
D = Adverb
G = Genitive attribute, i.e. indeclinable adjective
H = Proper noun
I = Interjection
J = Conjunction
K = Adposition (pre- or postposition)
N = Numeral (cardinal)
O = Numeral (ordinal)
P = Pronoun
S = Common noun
U = Adjective (superlative)
V = Verb
X = Verb particle
Y = Abbreviation or acronym
Z = Punctuation