Estonian Reference Corpus is a morphologically annotated corpus by the tagging tool Filosoft. The corpus is comprised from written texts.

There two type of POS tag:

  • abbreviated tag contains only basic information about part of speech
  • longtag contains detailed information including other categories for specific part of speeches

For more information and longtag summary, see the Estonia Reference corpus document.

The version EstonianNC consists of written texts of Estonian Reference corpus and web texts of Estonian Web 2013.

Abbreviated part-of-speech tags:

A  Adjective (positive)
C  Adjective (comparative)
D  Adverb
G  Genitive attribute, i.e. indeclinable adjective
H  Proper noun
I  Interjection
J  Conjunction
K  Adposition (pre- or postposition)
N  Numeral (cardinal)
O  Numeral (ordinal)
P  Pronoun
S  Common noun
U  Adjective (superlative)
V  Verb
X  Verb particle
Y  Abbreviation or acronym
Z  Punctuation