Automatic term extraction

Sketch Engine can extract terms and keywords automatically using not only simple pattern matching and frequency count but also taking into account linguistic criteria. Being able to identify terminology automatically in subject-specific texts is vital for translators and terminologists. Sketch Engine can generate glossaries of keywords within seconds.

Terminology extraction is only possible for user corpora. If you do not have one, you can have Sketch Engine quickly create one for you.

After logging in (or after clicking home), go to (2) My own corpora and click the (3) wrench icon next to a user corpus you created.

Access your user corpora

Access your user corpora for term extraction

Scroll screen down to reach the Search corpus section of the left menu, click Keywords and Terms.

Extract one-word and multi-word terms from a corpus

Extract one-word and multi-word terms from a corpus

The procedure will start automatically. After several seconds, a list of single word keywords and multi-word terms will be displayed with clickable links to Wikipedia articles. Click the underlined numbers to display examples in context.

Extracted terms and keywords with links to wikipedia

Extracted terms and keywords with links to Wikipedia.

Download the lists via the links at the top – TBX for import into a CAT tool, CSV to open in MS Excel.

The terminology extraction is available for pre-loaded corpora too via the Word List menu. Parallel corpora (e.g. created from the translation memory of a CAT tool), can be used with bilingual terminology extraction to generate bilingual glossaries automatically.

To learn more about terminology extraction, see the User Guide.

Word lists

Sketch Engine can also generate any other type of word lists:

  • list of all words in the corpus
  • words beginning/ending/containing certain characters
  • list of nouns/verbs/adjectives etc.

To learn more about Keywords & Terms and Word Lists, please refer to the User manual.