This Korean koTenTen corpus is part of the TenTen corpora family. The texts were crawled from the web, cleaned and deduplicated.
A complete set of tools is available to work with this Korean koTenTen corpus:
Sketch Engine offers a range of tools to work with this Korean corpus.
The Korean TenTen corpus crawled by SpiderLing in August & September 2012. Encoded in UTF-8, cleaned, deduplicated. POS-tagged by HanNanum with a simplified tagset.
Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.
Your 5 days to get up-to-date with the latest developments in corpus-driven lexicography and to activate and enhance your corpus query skills with some of the top experts in the field.
learn more >