Shallow tagging is used for languages which we cannot tag with an existing tagger. The following tags are based on regular expressions and on frequency properties of tokens:
- FREQ – frequent words (200 most frequent word in language)
- CONTENT – other words
- CRD – numerals
- PUN – punctuations
- OTHER – other
Once a corpus is tagged with this simple tagset, it can be processed with Universal Sketch Grammar by Siva Reddy, Adam Kilgarriff, Pavel Rychlý.