Here is a brief summary of features aimed at terminologist such as term extraction and some additional functionality.

Features for terminology and terminography

  • Term extraction – find candidates for terms and terminology in your documents or in subject-specific corpora which Sketch Engine will find and download for you from the web.
  • Bilingual terminology extraction – upload your translation memory and perform a bilingual term extraction to find terminology and its foreign language counterparts

Terminology tasks can be aided with

  • usage checking with the help of concordance searches which find examples of a phrase or word in context sourced from domain-specific copora which Sketch Engine can automatically create for you.
  • word sketch will highlight the typical collocations and word combinations. Use general corpora for information about non-specialized language or subject-specific corpora for professional language.

Automated building of a subject-specific corpus

A subject-specific corpus is an invaluable source of information about specialized language. It is advisable to work with small corpora (e.g. about 100,000 words) made up of terminology-rich texts because it may give more precise results for domain-specific work. Sketch Engine unique feature will automatically find and download relevant texts on the internet for you and your specialized corpus can be ready within minutes. Typically, it will take about 10 mins to create a 1,000,000 word corpus. All additional functionality will be available automatically with your corpus: Word Sketch, concordance, term extraction, n-grams, word lists etc.

List of domain corpora already available in Sketch Engine

  • e-flux corpus – English art news digests
  • SiBol/Port corpus – corpus of English broadsheet newspapers
  • GerManC – historical Corpus of German Newspapers 1650–1800
  • TECU corpora – geodetics web corpora
  • RapCor – small corpus of spoken French in rap songs
  • Childes corpora – set of corpora containing rich variety of computerised transcripts from language learners
  • Europarl parallel corpus – extracted from the proceedings of the European Parliament in 21 languages


Adam Kilgarriff, Miloš Jakubíček, Vojtěch Kovář, Pavel Rychlý and Vít Suchomel (2014). Finding Terms in Corpora for Many Languages with the Sketch Engine. In Proceedings of the Demonstrations at the 14th Conference the European Chapter of the Association for Computational Linguistics, Sweden, April 2014, pp. 53–56.

Adam Kilgarriff (2013). Terminology finding, parallel corpora and bilingual word sketches in the Sketch Engine. In Proceedings ASLIB 35th Translating and the Computer Conference, London, May 2013.

Bilingual Terminology Extraction in Sketch Engine. Vít Baisa, Barbora Ulipová, and Michal Cukr. In Ninth Workshop on Recent Advances in Slavonic Natural Language Processing, Czech Republic, December 2015, pp. 61–67.

Sandra Young (2016). Using corpora in translationAvailable on the blog “The Deep End”

Adam Kilgarriff, Ondřej Herman, Jan Bušta, Pavel Rychlý and Miloš Jakubíček. DIACRAN: a framework for diachronic analysis (presentation). In Corpus Linguistics (CL2015), United Kingdom, July 2015.

Ondřej Herman and Vojtěch Kovář. Methods for Detection of Word Usage over Time. In Seventh Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2013. Brno: Tribun EU, 2013, pp. 79–85. ISBN 978-80-263-0520-0.

Ondřej Herman (2013). Automatic methods for detection of word usage in time. Bachelor thesis. Masaryk University, Faculty of Informatics.

For inspiration

In the paper below, Adam Kilgarriff offers an interesting, unusual and well-founded view of terminology.

Adam Kilgarriff (2007). I don’t believe in word sense. In Computers and the Humanities, 31(2), pp. 91–113.