The New Model Corpus tagged with SuperSenseTagger (sst-light) described in Ciaramita and Altun, 2006

Attributes include the Penn TreeBank tags and SuperSenseTagger (WordNet labels /list of super senses/) and Named Entity Labels. The corpus was presented at Skew-2 see the presentation in pdf (along with details of the Dante Disambiguation Project

This Corpus is finished but the Sketch Grammar is undergoing research and development.

Changelog

v1.0 (8th March 2010)

  • 115 million tokens

Diana McCarthy