A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.

Czech part-of-speech tagset is available in Czech corpora annotated by Majka or Ajka morphological tagging tools. The tagset was revisited in 2011.

An Example of a tag in the CQL concordance search box[tag="k1.*nP.*"] finds all nouns in plural, e.g. lidé, roky (note: please make sure that you use straight double quotation marks)

The whole tag is comprised of pairs – attribute and its value – the attribute is represented by a single lower case (for numbers) and its value by a single capital letter (P for plural). Each tag starts with the 2 characters representing part of speech, e.g. k1 means noun, k2 means adjective, etc.

See the whole POS tagset summary in pdf.


Czech text corpora

Sketch Engine offers dozens of Czech corpora.


1st position

k Part of speech
1 noun
2 adjective
3 pronoun
4 number
5 verb
6 adverb
7 preposition
8 conjunction
9 particle
0 interjection
A abbreviation
I punctuation

Example to find all verbs: [tag="k5.*"]

2nd position

g Gender (k1–k4) Example
M Animate masculine
I Inanimate masculine
N Neuter
F Feminine
R Family (surname)* Havlovi

Example to find all neuter nouns: [tag="k1gN.*"] or all masculine nouns [tag="k1g(M|I).*"]

3rd position

c Case (k1–k4, k7)
1–7 First–Seventh

Example to find all instrumental adjectives: [tag="k2.*c7.*"]

4th position

n Number (k1–k4)
S Singular
P Plural

Example to find all plural numbers: [tag="k4.*nP.*"]

5th position

e Negation (k2, k5, k6)
A Affirmation
N Negation

Example to find all feminine verbs in negative forms: [tag="k5gF.*nP.*"]

6th position

d Degree (k2, k6)
1 Positive
2 Comparative
3 Superlative

Example to find all comparative adjectives: [tag="k2.*d2.*"]

7th position

p Person (k3, k5)
1 First
2 Second
3 Third

Example to find all third-person pronouns: [tag="k3.*p3.*"]

8th position

w Stylistic flag (k1-k9)
A Archaism
B Poeticism
C Only in corpora
E Expressive
H Conversational
K Bookish
O Regional
R Rare
Z Obsolete

noun (k1) subclassification

For example: [tag="k1xP.*"]

1st position Description Example
x special paradigm
P půl, čtvrt

pronoun (k3) subclassification

x Type
P personal
O possessive
D demonstrative
T deliminative

y Type
F reflexive
Q interrogative
R relative
N negative
I indeterminate

number (k4) subclassification

x Type
C cardinal
O ordinal
R reproductive

y Type
N Negative
I Indeterminate

verb (k5) subclassification

m Type
F Infinitive
I Present Indicative
R Imperative
A Active part. (past)
N Passive part.
S Adv. part. (present)
D Adv. part. (past)
B Futreu indicative

a Aspect
P Perfect
I Imperfect

adverb (k6) subclassification

x Type
D Demonstrative
T Delimitative

y Type
Q Interrogative
R Relative
N Negation
I Indeterminate

conjunction (k8) subclassification

*t type
S Status
D Modal
T Expresses time
A Expresses respect
C Expresses reason
L Expresses place
M Expresses manner
Q Expresses extent

x Type
C Coordinate
S Subordinate

punctuation (kI) subclassification

x punctuation list
. .?!
, ,:;
“„“‚ ‘
( ({[<
) )}]>
~ ~$%^&-_+=\|/# etc.


JAKUBÍČEK, Miloš, Vojtěch KOVÁŘ a Pavel ŠMERK. Czech Morphological Tagset Revisited. In Horák, Rychlý. Proceedings of Recent Advances in Slavonic Natural Language Processing 2011. Brno: Tribun EU, 2011, pp. 29-42, 14 s. ISBN 978-80-263-0077-9.