The New Model Corpus tagged with SuperSenseTagger (sst-light) described in Ciaramita and Altun, 2006

Attributes include the Penn TreeBank tags and SuperSenseTagger (WordNet labels /list of super senses/) and Named Entity Labels. The corpus was presented at Skew-2 see the presentation in pdf (along with details of the Dante Disambiguation Project

This Corpus is finished but the Sketch Grammar is undergoing research and development.


v1.0 (8th March 2010)

  • 115 million tokens

Diana McCarthy