A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.

Russian multilingual MULTEXT-East specifications, version 4 are available in Russian corpora.

These specifications follow the (draft) Version 4 of the multilingual MULTEXT-East specifications, which can be found on http://nl.ijs.si/ME.

The basic idea is that for each major category (Noun, Verb, Adjective, etc) the specifications define a fixed set of attributes (Case, Number, Gender, Animacy, etc), each with its set of values (e.g. masculine, feminine, neuter). Each category-dependent attribute is assigned a position, and each of its values a one letter code, so a complete morphosyntactic description of a word can be encoded by a MorphoSyntactic Descriptions (MSDs). For instance, the attribute-value specification Category = Noun, Type = common, Gender = masculine, Number = singular, Case = accusative, Animate = no corresponds to the MSD Ncmsan. In case a certain attribute is not appropriate for a given combination of features or for a particular lexical item, its code is the hyphen, e.g. Afpns-s, where the case for Adjective qualificative positive neuter singular is undefined, when in the short form.

An Example of a tag in the CQL concordance search box: [tag=”N.*”] searches all nouns , e.g. книги (note: please make sure that you use straight double quotation marks)


1. Noun
2. Verb
3. Adjective
4. Pronoun
5. Adverb
6. Adposition
7. Conjunction
8. Numeral
9. Particle
10. Interjection
11. Abbreviation
12. Residual

Appendix A Index of Categories
Appendix B Index of Attributes
Appendix C Index of Values
Appendix D Lexical MSDs

(This page was taken from WayBack Machine dump http://nl.ijs.si/ME/)

Basic overview of Russian tagset

noun N.*
verb V.*
adjective A.*
pronoun P.*
adverb R.*
adposition S.*
conjunction C.*
numeral M.*
particle Q.*
interjection I.*
abbreviation Y.*
residual X.*

Source: http://nl.ijs.si/ME/

Russian text corpora in Sketch Engine

Sketch Engine offers dozens Russian language corpora.