A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.

Indian part-of-speech tagset created in terms of the Indian Language Machine Translation (ILMT) project comprising various Indian languages.

An Example of a tag in the CQL concordance search box[tag="NN.*|NST"] finds all nouns, e.g. ಮೇಲೆ, ಬಗ್ಗೆ (note: please make sure that you use straight double quotation marks)


Sl No. Category Tag name Example
1.1 Noun NN
1.2 NLoc NST
2. Proper Noun NNP
3.1 Pronoun PRP
3.2 Demonstrative DEM
4 Verb-finite VM
5 Verb Aux VAUX
6 Adjective JJ
7 Adverb RB *Only manner verb
8 Post position PSP
9 Particles RP bhI, to, hI, jI, hA.N, na,
10 Conjuncts CC bole (Bangla)
11 Question Words WQ
12.1 Quantifiers QF bahut, tho.DA, kam (Hindi)
12.2 Cardinal QC
12.3 Ordinal QO
12.4 Classifier CL
13 Intensifier INTF
14 Interjection INJ
15 Negation NEG
16 Quotative UT ani (Telugu), endru (Tamil), bole/mAne (Bangla), mhaNaje (Marathi), mAne (Hindi)
17 Sym SYM
18 Compounds *C
19 Reduplicative RDP
20 Echo ECH
21 Unknown UNK

Source: crawled from Wayback Machine at http://ltrc.iiit.ac.in/tr031/posguidelines.pdf