You are here:Home/MULTEXT-East Serbian part-of-speech tagset
A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.
MULTEXT-East Morphosyntactic Serbian Specification is available in Serbian corpora. The MULTEXT-East resources are a multilingual dataset for language engineering research and development. See more at http://anthology.aclweb.org/W/W03/W03-2904.pdf
An Example of a tag in the CQL concordance search box: [tag="N.*"] finds all nouns, e.g. srbija, sad (note: please make sure that you use straight double quotation marks)
3.9.1. Serbian Introduction
The Serbian MULTEXT-East specifications were developed in the scope of a Slovene-Serbian bilateral project. The basis was the Serbian lexicon and feature set developed in the Intex system. The specifications and associated resources are documented in: KRSTEV, Cvetana, VITAS, Duško, ERJAVEC, Tomaž. MULTEXT-East Resources for Serbian. Proceedings of the 7th International Multi-Conference INFORMATION SOCIETY IS 2004, Volume B: Language Technologies. October 13th – 14th 2004 Jožef Stefan Institute, Ljubljana, Slovenia.