Revolutionize the dictionary-building process

Sketch Engine offers tools to significantly speed up the process of dictionary building while making it more accurate, efficient, complete and consistent. Sketch Engine’s suite of lexicographic tools is designed for lexicographers striving to conform to modern standards. Sketch Engine focusses lexicographers on what is typical in a language while assuring that any new usage will be brought to their attention as soon as it starts entering common use.

Lexicography tools

Although any Sketch Engine feature has its use in lexicography, here is a summary of the key tools each lexicographer should be familiar with:

The existence of a suitable corpus is a prerequisite for any serious lexicographic work. Sketch Engine already comes with hundreds of corpora but also offers all the tools needed to build a text corpus. The user can create a corpus from their own materials, have Sketch Engine download suitable texts from the web or combine both methods.

Lexicographic solutions

Sketch Engine strives to develop highly usable tools to respond to lexicographers’ needs.

Headword list development — word list

In the past, the most reliable way of developing a headword list was by copying it from an existing dictionary. This meant that neologisms took a long time to enter a dictionary and words gone out of use stayed in dictionaries much longer than was desirable because gathering sufficient evidence to justify their inclusion or removal was a long process.

Sketch Engine’s word list feature can generate a list of headwords or even word forms including any neologisms directly supported by evidence of the extent of use.

Writing entries — concordance, frequency, n-grams and Word Sketches

Discovering word senses and other lexical units (fixed phrases, compounds, mutliword expressions etc.) is easy with an advanced concordance search aided but a vast number of search options including CQL. The frequency count can shed light on a typical preference of a word in terms of text type or subject area.

The most frequently used multiword expressions can be identified with n-grams.

Word Sketches, Sketch Engine’s hallmarks feature, shed light on the syntactic and collocational behaviour by summarising information from thousands of concordance hits on an easy to understand screen with direct access to the underlying evidence. Close synonyms can be analysed further with Word Sketch Difference.

TickBox Lexicography

TickBox Lexicography is a new approach to building a dictionary involving an extensive use of text corpora and corpus management and query software which processes the corpus data, evaluates them and presents results which can be collected by lexicographers with a click of the mouse.

The TickBox Lexicography tools in Sketch Engine use Word Sketches enhanced with the GDEX technology to suggest good dictionary examples for collocations selected by the user.

The process

  • the lexicographer generates a Word Sketch
  • then selects the required collocations
  • Sketch Engine suggests a number of good dictionary examples of use
  • the lexicographer selects the most appropriate example(s)
  • and exports the list of example together with the collocation as an xml file ready for import into another software

Translation cadidates

Parallel corpora are an invaluable resource for looking up translation candidates including the less obvious ones and also indirect translations, i.e. cases when a shot expression is translated using a longer phrase or a sentence. Sketch Engine offers the search feature and also a selection of parallel corpora.

Building a thesaurus

The thesaurus feature provides suggestions of similar words based on distributional semantics, i.e. based on identifying words which tend to appear surrounded by the same words as the search word. This yields surprisingly accurate results especially when used together with the large corpora provided Sketch Engine.


Bibliography

Adam Kilgarriff (2013). Using corpora as data sources for dictionaries. In Howard Jackson (ed.) The Bloomsbury Companion to Lexicography, Bloomsbury, London. Chapter 4.1, pp. 77–96.

Adam Kilgarriff (2009). Putting the corpus into the dictionary. In Perspectives in Lexicography: Asia and Beyond, Israel, pp. 239–247.

Using corpora [and the web] as data sources for dictionaries (Adam Kilgarriff)