Korean TenTen corpus crawled by SpiderLing in August & September 2012. Encoded in UTF-8, cleaned, deduplicated. POS-tagged by HanNanum with a simplified tagset.

Changelog

v1.0 (10 September 2012)

  • initial version – 461 million words