Corpus of Georgian Wikipedia

The Georgian Wikipedia corpus is a text corpus created from the Georgian internet encyclopedia Wikipedia in 2016. For the building corpus was used Wikipedia dump from August 2016. The corpus was prepared for purpose of the EURALEX workshop. Corpus texts are not part-of-speech tagged yet.

A complete set of Sketch Engine tools is available to work with this Georgian Wikipedia corpus to generate:

  • word lists – lists of Georgian nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context

Search the Georgian Wikipedia corpus

Sketch Engine offers a range of tools to work with the English Wikipedia corpus.

or

Your own Wikipedia corpora

We can build a Wikipedia corpus in any language for you. Please contact us.

Use Sketch Engine in minutes

Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.