German TenTen corpus.

The corpus is double-tagged with RFTagger (attribute tag, tagset reference) and TreeTagger (attribute tt_tag,  tagset reference).

Changelog

deTenTen13

  • Web texts in German obtained in 2013 – 16.5 billion tokens

v 2.0 (28 April 2011)

  • fixed problems with part-of-speech tagging which caused a major data loss in the previous version
  • 2.8 billion tokens

v 1.0 (30 November 2010)

  • initial version – 1.2 billion tokens