Corpus of Georgian Wikipedia

The Georgian Wikipedia corpus is a text corpus created from the Georgian internet encyclopedia Wikipedia in 2016. For the building corpus was used Wikipedia dump from August 2016. The corpus was prepared for purpose of the EURALEX workshop. Corpus texts are not part-of-speech tagged yet.

A complete set of Sketch Engine tools is available to work with this Georgian Wikipedia corpus to generate:

  • word lists – lists of Georgian nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context

