Corpus of Georgian Wikipedia

The Georgian Wikipedia corpus is a text corpus created from the Georgian internet encyclopedia Wikipedia in 2016. For the building corpus was used Wikipedia dump from August 2016. The corpus was prepared for purpose of the EURALEX workshop. Corpus texts are not part-of-speech tagged yet.

A complete set of Sketch Engine tools is available to work with this Georgian Wikipedia corpus to generate:

  • word lists – lists of Georgian nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context

Other Wikipedia corpora in Sketch Engine

Sketch Engine team can the make-to-order Wikipedia corpus of any language of make-to-order. Please email us at if you interested in this.

Search the Georgian Wikipedia corpus

Sketch Engine offers a range of tools to work with the English Wikipedia corpus.


Other text corpora in Sketch Engine

Sketch Engine offers 350+ language corpora.

Use Sketch Engine in minutes

Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.