A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.

POS tagset for Modern Standard Arabic is available in Arabic corpora and its full tagset description is described in the paper Towards an optimal POS tag set for Modern Standard Arabic Processing Recent Advances in Natural Language Processing (Mona T. DIAB, 2007).

An Example of a tag in the CQL concordance search box: [tag=”VERB|VB.*”] finds all verbs, e.g. رب


noun DT|NN.*
verb VERB|VB.*
adjective JJ
adverb W?RB
conjunction CC
preposition IN


tag  description
NN noun, singular or mass
IN Preposition or subordinating conjunction
PUNC punctuation
JJ adjective
NNP Proper noun, singular
CC Coordinating conjunction
VBP Verb, non-3rd person singular present
VBD Verb, past tense
NNS noun, plural
RP particle
CD Cardinal number
WP Wh-pronoun
DT determiner
NOFUNC withou function
PRP Personal pronoun
RB adverb
VBN verb, past participle
UH interjection
WRB Wh-adverb
NNPS Proper noun, plural
VB verb, base form
VERB verb, base form
NUMCOMMA remove all non-numeric characters and convert “,” to “.” and vise versa

Find more in DIAB, Mona. Towards an optimal POS tag set for Modern Standard Arabic processing. In: Proceedings of recent advances in natural language processing (RANLP), 2007, pp. 91–96.