A tagset is a list of part-of-speech tags, i.e. labels used to indicate the part of speech and often also other grammatical categories (case, tense etc.) of each token in a text corpus.

Penn Treebank tagset

The English Penn Treebank tagset is used with English corpora annotated by the TreeTagger tool, developed by Helmut Schmid in the TC project at the Institute for Computational Linguistics of the University of Stuttgart. This version of the tagset contains modifications developed by Sketch Engine (earlier version).

See a more recent version of this tagset.

The table shows English Penn TreeBank tagset with Sketch Engine modifications (earlier version).

Example:  [tag="NNS"] finds all nouns in the plural, e.g. people, years when used in the CQL concordance search (always use straight double quotation marks in CQL)

POS Tag Description Example
CC coordinating conjunction and
CD cardinal number 1, third
DT determiner the
EX existential there there is
FW foreign word les
IN preposition, subordinating conjunction in, of, like
IN/that that as subordinator that
JJ adjective green
JJR adjective, comparative greener
JJS adjective, superlative greenest
LS list marker 1)
MD modal could, will
NN noun, singular or mass table
NNS noun plural tables
NP proper noun, singular John
NPS proper noun, plural Vikings
PDT predeterminer both the boys
POS possessive ending friend’s
PP personal pronoun I, he, it
PPZ possessive pronoun my, his
RB adverb however, usually, naturally, here, good
RBR adverb, comparative better
RBS adverb, superlative best
RP particle give up
SENT Sentence-break punctuation . ! ?
SYM Symbol / [ = *
TO infinitive ‘to’ togo
UH interjection uhhuhhuhh
VB verb be, base form be
VBD verb be, past tense was, were
VBG verb be, gerund/present participle being
VBN verb be, past participle been
VBP verb be, sing. present, non-3d am, are
VBZ verb be, 3rd person sing. present is
VH verb have, base form have
VHD verb have, past tense had
VHG verb have, gerund/present participle having
VHN verb have, past participle had
VHP verb have, sing. present, non-3d have
VHZ verb have, 3rd person sing. present has
VV verb, base form take
VVD verb, past tense took
VVG verb, gerund/present participle taking
VVN verb, past participle taken
VVP verb, sing. present, non-3d take
VVZ verb, 3rd person sing. present takes
WDT wh-determiner which
WP wh-pronoun who, what
WP$ possessive wh-pronoun whose
WRB wh-abverb where, when
# # #
$ $ $
Quotation marks ‘ “
`` Opening quotation marks ‘ “
( Opening brackets ( {
) Closing brackets ) }
, Comma ,
: Punctuation – ; : — …

Main differences to the default Penn tagset

In TreeTagger

  • Distinguishes be (VB) and have (VH) from other (non-modal) verbs (VV)
  • For proper nouns, NNP and NNPS have become NP and NPS
  • SENT for end-of-sentence punctuation (other punctuation tags may also differ)

In TreeTagger tool + Sketch Engine modifications

  • the word ‘to’ is tagged IN when used as a preposition and TO when used as an infinitive marker

Bibliography

M. Marcus, B. Santorini and M.A. Marcinkiewicz (1993). Building a large annotated corpus of English: The Penn Treebank. In Computational Linguistics, volume 19, number 2, pp. 313–330.

English text corpora

Sketch Engine offers dozens of English corpora with the Penn Treebank tagset.

or