The corpus is prepared by Corpus factory method. Full details are described in A corpus factory for many languages (Kilgarriff et al. at LREC 2010).


v2.0 (17th Jan 2012)

The corpus is tagged using a new POS tagger (90.73% accuracy), lemmatizer and morph analyser downloaded from

The tagset details are described see POS guidelines for Indian languages (crawled from Web Archive at

We wrote a simple sketch grammar for Telugu and generated word sketches and distributional thesaurus for Telugu. If you would like to contribute, please contact us.