A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.

Lithuanian part-of-speech tagset

Lithuanian part-of-speech tagset is available in Lithuanian corpora annotated by POS tagger tool.

Lithuanian text corpora in Sketch Engine

Sketch Engine offers dozens Lithuanian language corpora.


The following table shows English Penn TreeBank part-of-speech tagset including Sketch Engine differences (older version).

An Example of a tag in the CQL concordance search box: [tag="N.*"] finds all nouns, e.g. Lietuvos, metų (note: please make sure that you use straight double quotation marks)

No. Feature group Category Tag codes
1 Part of Speech
Noun  N
Adjective A
Numeral M
Pronoun P
Verb V
Adverb R
Interjection I
Onomatopoeia O
Particle Q
Preposition S
Conjunction C
Acronym Z
Abbreviation Y
Roman numbers U
Residual X
Stable phrases H
Punctuation mark, symbols T
HTML tag t
2 Noun types proper p
common c
3 Verb
main m
infinitive n
participle p
adverbial participle a
half participle h
adverbial participle2 b
indicative mood i
imperative mood m
subjective mood s
4 Numerals cardinal c
ordinal o
multiple m
collective l
5 Definiteness pronominal p
non-pronominal n
6 Reflexiveness reflexive r
non-reflexive n
Type active a
passive p
necessity n
8 Tense present tense p
past tense a
past frequentative case q
future tense f
simple past s
9 Degree positive p
comparative c
superlative s
10 Gender feminine f
masculine m
neuter n
common c
11 Number singular s
plural p
dual d
12 Case nominative n
genitive g
dative d
accusative a
instrumental i
locative l
vocative v
illiative x
13 Person 1st 1
2nd 2
3rd 3
14. Positiveness positive p
negative n
15. Phrases stable phrases with undefined POS H
16. Unknown foreign f
typos t
segmentation error p