The corpus is prepared by Corpus factory method. Full details are described in A corpus factory for many languages (Kilgarriff et al. at LREC 2010).

Changelog

v2.0 (17th Jan 2012)

The corpus is tagged using a new POS tagger (90.73% accuracy), lemmatizer and morph analyser downloaded from  http://sivareddy.in/downloads

The tagset details are described see POS guidelines for Indian languages (crawled from Web Archive at http://ltrc.iiit.ac.in/tr031/posguidelines.pdf)

We wrote a simple sketch grammar for Telugu and generated word sketches and distributional thesaurus for Telugu. If you would like to contribute, please contact us.