A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.

Tatar part-of-speech tagset is available in the Tatar News corpus.

An Example of a tag in the CQL concordance search box[pos="<np?>"] finds all nouns, e.g. Татарстан, кеше (note: please make sure that you use straight double quotation marks)


Part of speech categories = First-level tags

Tag Part of speech Example
<abbr>  Abbreviation  Аббревиатура
<adj>  Adjective  Прилагательное
<adv>  Adverb  Наречие
<cm> comma ,
<cnjadv>  Adverbial conjunction  Наречие-союз
<cnjcoo>  Coordinating conjunction  Сочинительный союз
<cnjsub>  Subordinating conjunction  Подчинительный союз
<cop>  Copula  Копула
<det>  Determiner  Детермирнатив
<ideo>  Ideophone  Звукоподражательное слово
<ij>  Interjection  Междометие
<n>  Noun  Существительное
<np>  Proper noun  Имя собственное
<num>  Numeral  Числительное
<post>  Postposition  Послелог
<postadv>  Postadverb  Посленаречие
<prn>  Pronoun  Местоимение
<sent> sentence marker . ? !
<v>  Verb  Глагол
<vaux>  Auxiliary verb  Вспомогательный глагол
<apos> apostrophe
<guio> hyphen
<lpar> left Parenthetical marker (
<lquot> left Quote marker “, «
<mod_ass> Assertive modal particle бит
<mod_ind> Indefinite modal particle (expresses doubt) дыр
<qst> Modal question particle микән
<rpar> right Parenthetical marker )
<rquot> right Quote marker ”, »

Proper noun types (<adj>)

Tag Description
<top>  Toponym
<ant>  Anthroponym
<cog>  Cognomen
<pat>  Patronym
<org>  Organization
<al>  Other

Source: http://corpus.tatar/index_en.php?openinframe=manual/tags_uniq.pdf