A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.

MULTEXT-East Morphosyntactic Slovenian Specification version 4 is available in Slovenian corpora. The MULTEXT-East resources are a multilingual dataset for language engineering research and development.

In comparison with the 3rd version of the tagset, there are a number of changes, e.g. certain attributes or their values, allowed combinations of attribute-values, as well as the lexical assignment of MSD to particular words or word groups. Furthermore, they re-ordered some attributes to allow for more compact encoding on MSDs.

The version 4 of the MULTEXT-East Slovenian part-of-speech tagset.

An Example of a tag in the CQL concordance search box[tag="S.m.*"] finds all masculine nouns, e.g. človek, Maribor (note: please make sure that you use straight double quotation marks)

For each tag, the first character specifies the major word class, as specified in Table 1 below, and each character thereafter is to be interpreted according to the relevant Table below.

Somet–d

is to be interpreted, character by character, as follows:

===============================
N      Category  =   noun
c      Type      =   common
m      Gender    =   masculine
s      Number    =   singular
a      Case      =   accusative
-      Definitiness unspecified
-      Clitic unspecified
y      Animate   =  yes
===============================

All aspects of the tagset exist in both Slovene and English. Corpora are tagged with the Slovene version.

Tagset

Part-of-speech categories

PoS-en Code-en  PoS-sl Code-sl
Noun N.* Samostalnik S.*
Verb V.* Glagol G.*
Adjective A.* Pridevnik P.*
Adverb R.* Prislov R.*
Pronoun P.* Zaimek Z.*
Numeral M.* Števnik K.*
Adposition S.* Predlog D.*
Conjunction C.* Veznik V.*
Particle Q.* Členek L.*
Interjection I.* Medmet M.*
Abbreviation Y.* Okrajšava O.*
Residual X.* Neuvrščeno N.*

2. Noun (N) // Samostalnik (S)

P Attribute (sl) Value (sl) Code (sl) Attribute (en) Value (en) Code (en)
0 besedna_vrsta samostalnik S CATEGORY Noun N
1 vrsta občno_ime o Type common c
lastno_ime l proper p
2 spol moški m Gender masculine m
ženski z feminine f
srednji s neuter n
3 število ednina e Number singular s
dvojina d dual d
množina m plural p
4 sklon imenovalnik i Case nominative n
rodilnik r genitive g
dajalnik d dative d
tožilnik t accusative a
mestnik m locative l
orodnik o instrumental i
5 živost ne n Animate no n
da d yes y

3. Verb (V) // Glagol (G)

P Attribute (sl) Value (sl) Code (sl) Attribute (en) Value (en) Code (en)
0 besedna_vrsta glagol G CATEGORY Verb V
1 vrsta glavni g Type main m
pomožni p auxiliary a
2 vid dovršni d Aspect perfective e
nedovršni n progressive p
dvovidski v biaspectual b
3 oblika nedoločnik n VForm infinitive n
namenilnik m supine u
deležnik d participle p
sedanjik s present r
prihodnjik p future f
pogojnik g conditional c
velelnik v imperative m
4 oseba prva p Person first 1
druga d second 2
tretja t third 3
5 število ednina e Number singular s
množina m plural p
dvojina d dual d
6 spol moški m Gender masculine m
ženski z feminine f
srednji s neuter n
7 nikalnost nezanikani n Negative no n
zanikani d yes y

4. Adjective (A) // Pridevnik (P)

P Attribute (sl) Value (sl) Code (sl) Attribute (en) Value (en) Code (en)
0 besedna_vrsta pridevnik P CATEGORY Adjective A
1 vrsta splošni p Type general g
svojilni s possessive s
deležniški d participle p
2 stopnja nedoločeno n Degree positive p
primernik p comparative c
presežnik s superlative s
3 spol moški m Gender masculine m
ženski z feminine f
srednji s neuter n
4 število ednina e Number singular s
dvojina d dual d
množina m plural p
5 sklon imenovalnik i Case nominative n
rodilnik r genitive g
dajalnik d dative d
tožilnik t accusative a
mestnik m locative l
orodnik o instrumental i
6 določnost ne n Definiteness no n
da d yes y

5. Adverb (R) // Prislov (R)

P Attribute (sl) Value (sl) Code (sl) Attribute (en) Value (en) Code (en)
0 besedna_vrsta prislov R CATEGORY Adverb R
1 vrsta splošni s Type general g
deležje d participle r
2 stopnja nedoločeno n Degree positive p
primernik r comparative c
presežnik s superlative s

6. Pronoun (P) // Zaimek (Z)

P Attribute (sl) Value (sl) Code (sl) Attribute (en) Value (en) Code (en)
0 besedna_vrsta zaimek Z CATEGORY Pronoun P
1 vrsta osebni o Type personal p
svojilni s possessive s
kazalni k demonstrative d
oziralni z relative r
povratni p reflexive x
celostni c general g
vprašalni v interrogative q
nedoločni n indefinite i
nikalni l negative z
2 oseba prva p Person first 1
druga d second 2
tretja t third 3
3 spol moški m Gender masculine m
ženski z feminine f
srednji s neuter n
4 število ednina e Number singular s
dvojina d dual d
množina m plural p
5 sklon imenovalnik i Case nominative n
rodilnik r genitive g
dajalnik d dative d
tožilnik t accusative a
mestnik m locative l
orodnik o instrumental i
6 število_svojine ednina e Owner_Number singular s
dvojina d dual d
množina m plural p
7 spol_svojine moški m Owner_Gender masculine m
ženski z feminine f
srednji s neuter n
8 naslonskost klitična k Clitic yes y
navezna z bound b

7. Numeral (M) // Števnik (K)

P Attribute (sl) Value (sl) Code (sl) Attribute (en) Value (en) Code (en)
0 besedna_vrsta števnik K CATEGORY Numeral M
1 zapis arabski a Form digit d
rimski r roman r
besedni b letter l
2 vrsta glavni g Type cardinal c
vrstilni v ordinal o
zaimkovni z pronominal p
drugi d special s
3 spol moški m Gender masculine m
ženski z feminine f
srednji s neuter n
4 število ednina e Number singular s
dvojina d dual d
množina m plural p
5 sklon imenovalnik i Case nominative n
rodilnik r genitive g
dajalnik d dative d
tožilnik t accusative a
mestnik m locative l
orodnik o instrumental i
6 določnost ne n Definiteness no n
da d yes y

7. Adposition (S) // Predlog (D)

P Attribute (sl) Value (sl) Code (sl) Attribute (en) Value (en) Code (en)
0 besedna_vrsta predlog D CATEGORY Adposition S
1 sklon imenovalnik i Case nominative n
rodilnik r genitive g
dajalnik d dative d
tožilnik t accusative a
mestnik m locative l
orodnik o instrumental i

8. Conjunction (C) // Veznik (V)

P Attribute (sl) Value (sl) Code (sl) Attribute (en) Value (en) Code (en)
0 besedna_vrsta veznik V CATEGORY Conjunction C
1 vrsta priredni p Type coordinating c
podredni d subordinating s

9. Particle (Q) // Členek (L)

P Attribute (sl) Value (sl) Code (sl) Attribute (en) Value (en) Code (en)
0 besedna_vrsta členek L CATEGORY Particle Q

10. Interjection (I) // Medmet (M)

P Attribute (sl) Value (sl) Code (sl) Attribute (en) Value (en) Code (en)
0 besedna_vrsta medmet M CATEGORY Interjection I

12. Abbreviation (Y) // Okrajšava (O)

P Attribute (sl) Value (sl) Code (sl) Attribute (en) Value (en) Code (en)
0 besedna_vrsta okrajšava O CATEGORY Abbreviation Y

12. Residual (X) // Neuvrščeno (N)

P Attribute (sl) Value (sl) Code (sl) Attribute (en) Value (en) Code (en)
0 besedna_vrsta neuvrščeno N CATEGORY Residual X
1 vrsta tujejezično j Type foreign f
tipkarska t typo t
program p program p

Source: http://nl.ijs.si/ME/V4/msd/html/msd-sl.html