A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.

Indian part-of-speech tagset created in terms of the Indian Language Machine Translation (ILMT) project comprising various Indian languages.

An Example of a tag in the CQL concordance search box[tag="N.*"]finds all nouns, e.g. ಮೇಲೆ, ಬಗ್ಗೆ (note: please make sure that you use straight double quotation marks)

Tagset

Category Subcategory Part-of-speech tag
NOUN Common NC.*
Proper NP.*
Verbal NV.*
Spatio-temporal NST
VERB Main VM.*
Auxiliary VA.*
PRONOUN Pronominal PPR.*
Reflexive PRF.*
Reciprocal PRC.*
Relative PRL.*
Wh-pronoun PWH.*
NOMINAL MODIFIER Adjective JJ.*
Quantifier JQ.*
DEMONSTRATIVE Absolute DAB.*
Relative DRL.*
Wh DWH.*
ADVERB Manner AMN.*
Location ALC.*
PARTICIPLE Verbal (Adverbial) LV.*
Conditional LC.*
PARTICLE Coordinating CCD.*
Subordinating CSB.*
Classifier CCL.*
Interjection CIN.*
Others CX.*
Postposition PP
Punctuation PU
RESIDUAL Foreign word RDF
Symbol RDS
Others RDX

Attributes and their tags

ATTRIBUTE SYMBOL Valuesymbol
NUMBERNUM Singularsg Pluralpl
PERSONPER First1 Second2 Third3
TENSETNS Presentprs Pastpst Futurefut
CASE MARKERCSM Accusativeacc Genitivegen Locativegen
ASPECTASP Simplesmp Progressiveprg Perfectpft
MOODMOOD Declarativedcl Imperativeimp Habitualhab
FINITENESSFIN Finitefin Non-finitenfn Infiniteifn
DISTRIBUTIVEDSTB Yesy Non
DEFINITENESS Yesy Non
EMPHATICEMPH Yesy Non
NEGATIVENEG Yesy Non
HONORIFICITYHON Yesy Non
NUMERALNML Ordinalord Cardinalcrd Non-numeralnnm
REALIS Realisrls Irrealisils

Common value for all the attributes:

  • Not-applicable (0)
    – When any value is not applicable to the category or the relevant morpho-syntactic feature is not available.
    – When the category is a binary valued category, i.e., the values of a particular Attribute are ‘yes’ and ‘no’ as in the case of Emphatic, Negative, Definiteness etc.; annotate/select the value as ‘yes’ only when the morphological attribute is present. Otherwise, annotate as ‘no’.
  • Undecided or doubtful (x)
    – when the annotator is not sure about a possible attribute, instead of marking on the basis of doubt, tag it as ‘x’, e.g., inherently ambiguous cases would be given priority of the contexts; but if they still remain disambiguated, annotate the attributes to be ‘x’.

Source: https://catalog.ldc.upenn.edu/docs/LDC2010T16/Annotation_Guidelines_for_Bangla.pdf

Other text corpora in Sketch Engine

Sketch Engine offers 350+ language corpora.

Use Sketch Engine in minutes

Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.