Search the Korean koTenTen corpus

This Korean koTenTen corpus is part of the TenTen corpora family. The texts were crawled from the web, cleaned and deduplicated.

A complete set of tools is available to work with this Korean koTenTen corpus:

  • word sketch – Korean collocations categorized by grammatical relations
  • thesaurus – synonyms and similar words for every word
  • word lists – lists of Korean nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context

Search the Korean koTenTen corpus

Sketch Engine offers a range of tools to work with this Korean corpus.

or

The Korean TenTen corpus crawled by SpiderLing in August & September 2012. Encoded in UTF-8, cleaned, deduplicated. POS-tagged by HanNanum with a simplified tagset.

Changelog

v1.0 (10 September 2012)

  • initial version – 461 million words

Learn to use Sketch Engine in minutes

Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.