German TenTen corpus is a corpus from the TenTen class of corpora usually containing billion and more words.
- Web texts in German obtained in 2013 – 16.5 billion tokens
v 2.0 (28 April 2011)
- fixed problems with part-of-speech tagging which caused a major data loss in the previous version
- 2.8 billion tokens
v 1.0 (30 November 2010)
- initial version – 1.2 billion tokens