English CLAWS part-of-speech tagset, version 5

A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.

English CLAWS part-of-speech tagset version 5 is available in English corpora annotated by the tool using CLAWS (the Constituent Likelihood Automatic Word-tagging System) developed by University Centre for Computer Corpus Research on Language at Lancaster University.

English tagsets

used in Sketch Engine

The Constituent Likelihood Automatic Word-tagging System abbreviated CLAWS was developed by UCREL at Lancaster University. This is the 5th version of used tagset.

An Example of a tag in the CQL concordance search box: [tag="VBD"] finds all past forms of the verb “be”: was, were (note: please make sure that you use straight double quotation marks)

TAGSET

POS Tag	Description
AJ0	adjective (unmarked) (e.g. GOOD, OLD)
AJC	comparative adjective (e.g. BETTER, OLDER)
AJS	superlative adjective (e.g. BEST, OLDEST)
AT0	article (e.g. THE, A, AN)
AV0	adverb (unmarked) (e.g. OFTEN, WELL, LONGER, FURTHEST)
AVP	adverb particle (e.g. UP, OFF, OUT)
AVQ	wh-adverb (e.g. WHEN, HOW, WHY)
CJC	coordinating conjunction (e.g. AND, OR)
CJS	subordinating conjunction (e.g. ALTHOUGH, WHEN)
CJT	the conjunction THAT
CRD	cardinal numeral (e.g. 3, FIFTY-FIVE, 6609) (excl ONE)
DPS	possessive determiner form (e.g. YOUR, THEIR)
DT0	general determiner (e.g. THESE, SOME)
DTQ	wh-determiner (e.g. WHOSE, WHICH)
EX0	existential THERE
ITJ	interjection or other isolate (e.g. OH, YES, MHM)
NN0	noun (neutral for number) (e.g. AIRCRAFT, DATA)
NN1	singular noun (e.g. PENCIL, GOOSE)
NN2	plural noun (e.g. PENCILS, GEESE)
NP0	proper noun (e.g. LONDON, MICHAEL, MARS)
NULL	the null tag (for items not to be tagged)
ORD	ordinal (e.g. SIXTH, 77TH, LAST)
PNI	indefinite pronoun (e.g. NONE, EVERYTHING)
PNP	personal pronoun (e.g. YOU, THEM, OURS)
PNQ	wh-pronoun (e.g. WHO, WHOEVER)
PNX	reflexive pronoun (e.g. ITSELF, OURSELVES)
POS	the possessive (or genitive morpheme) ‘S or ‘
PRF	the preposition OF
PRP	preposition (except for OF) (e.g. FOR, ABOVE, TO)
PUL	punctuation – left bracket (i.e. ( or [ )
PUN	punctuation – general mark (i.e. . ! , : ; – ? … )
PUQ	punctuation – quotation mark (i.e. ` ‘ ” )
PUR	punctuation – right bracket (i.e. ) or ] )
TO0	infinitive marker TO
UNC	“unclassified” items which are not words of the English lexicon
VBB	the “base forms” of the verb “BE” (except the infinitive), i.e. AM, ARE
VBD	past form of the verb “BE”, i.e. WAS, WERE
VBG	-ing form of the verb “BE”, i.e. BEING
VBI	infinitive of the verb “BE”
VBN	past participle of the verb “BE”, i.e. BEEN
VBZ	-s form of the verb “BE”, i.e. IS, ‘S
VDB	base form of the verb “DO” (except the infinitive), i.e.
VDD	past form of the verb “DO”, i.e. DID
VDG	-ing form of the verb “DO”, i.e. DOING
VDI	infinitive of the verb “DO”
VDN	past participle of the verb “DO”, i.e. DONE
VDZ	-s form of the verb “DO”, i.e. DOES
VHB	base form of the verb “HAVE” (except the infinitive), i.e. HAVE
VHD	past tense form of the verb “HAVE”, i.e. HAD, ‘D
VHG	-ing form of the verb “HAVE”, i.e. HAVING
VHI	infinitive of the verb “HAVE”
VHN	past participle of the verb “HAVE”, i.e. HAD
VHZ	-s form of the verb “HAVE”, i.e. HAS, ‘S
VM0	modal auxiliary verb (e.g. CAN, COULD, WILL, ‘LL)
VVB	base form of lexical verb (except the infinitive)(e.g. TAKE, LIVE)
VVD	past tense form of lexical verb (e.g. TOOK, LIVED)
VVG	-ing form of lexical verb (e.g. TAKING, LIVING)
VVI	infinitive of lexical verb
VVN	past participle form of lex. verb (e.g. TAKEN, LIVED)
VVZ	-s form of lexical verb (e.g. TAKES, LIVES)
XX0	the negative NOT or N’T
ZZ0	alphabetical symbol (e.g. A, B, c, d)

NOTE: “DITTO TAGS”

Any of the tags listed above may, in theory, be modified by the addition of a pair of numbers to it: eg. DD21, DD22 This signifies that the tag occurs as part of a sequence of similar tags, representing a sequence of words which for grammatical purposes are treated as a single unit. For example the expression in terms of is treated as a single preposition, receiving the tags:

		 in_II31 terms_II32 of_II33

The first of the two digits indicates the number of words/tags in the sequence, and the second digit the position of each word within that sequence.

Such ditto tags are not included in the lexicon, but are assigned automatically by a program called IDIOMTAG which looks for a range of multi-word sequences included in the idiomlist. The following sample entries from the idiomlist show that syntactic ambiguity is taken into account, and also that, depending on the context, ditto tags may or may not be required for a particular word sequence:

		at_RR21 length_RR22
		a_DD21/RR21 lot_DD22/RR22
		in_CS21/II that_CS22/DD1

Source: http://ucrel.lancs.ac.uk/claws5tags.html

Largest English corpus

Explore our English Trends corpus, which totals over 80 billion words and grows automatically every week.

open in Sketch Engine

TAGSET

Largest English corpus

for learners of languages

A Course in Lexicography and Lexical Computing

term extraction

learn sketch engine