A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.

Modified English TreeTagger part-of-speech tagset is available in English corpora annotated by the tool TreeTagger that was developed by Helmut Schmid in the TC project at the Institute for Computational Linguistics of the University of Stuttgart and containing modifications developed by Sketch Engine (currently pipeline version 2).

An Example of a tag in the CQL concordance search box[tag="N_.*Pl"] finds all nouns in plural, e.g. joissa, ihmiset  (note: please make sure that you use straight double quotation marks)


A|A_.* adjective
Adv|Ag.* adverb
Abbr abbreviation
Adp_.* adposition
CC coordinating conjunction
CS preposition and subordinating conjuction
Interj interjection
N_.* noun
Num.* numeral
NON-TWOL non-word or foreign word
PrfPrc.* perfect participle
Pron.* pronoun
PrsPrc.* present participle
Punct punctuation except for sentence-ending punctuation
SENT sentence-ending punctuation (. or ! or ? or their combination)
V_.* verb (except for present participle and perfect participle)

See the list of all POS tags in the Finnish TreeTagger tagset at http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/finnish-tags.txt