Word Sketch – a summary of a word’s behaviour

A word sketch is a one-page summary of the word’s grammatical and collocational behaviour. It shows the word’s collocates categorised by grammatical relations such as words that serve as an object of the verb, words that serve as a subject of the verb, words that modify the word etc.

Why use word sketches?

A word sketch negates the need for reviewing hundreds and thousands of examples found in the corpus one by one. Everything is displayed in a compact time-saving format. The following screenshot shows a part of the word sketch for goal (noun) based on over 22,000 examples of use. The phrases in grey show examples of how exactly the word combines with its collocates. Clicking the plus (+) sign will display a multi-word sketch, e.g. clicking lead+ in the third column will display a new word sketch showing how the phrase lead a team combines with other words.

A word sketch for the word team (noun)

information on from this screen is explained in detail further below

How to create a word sketch?

Log in to Sketch Engine and select a corpus. Then, in the left menu, click Word Sketch to display the following screen.

Word sketch basic options

(1) type a lemma
You can try typing a phrase (e.g. hot water). Note that if the phrase is rather infrequent, you may receive poor results or no results at all.

(2) (optional) select a part of speech, when set to all, word sketches for all parts of speech will be generated and the user can switch between them

(3) (optional) advanced options are explained further below

(4) generates the word sketch.

You should see a screen similar to the screenshot further above.

Word sketch in detail

The result screen (with part of speech set to Auto) looks like this:

Word sketch showing collocations generated from text corpus data

Result screen description

(1) the header shows

line 1: the lemma, the part of speech in brackets, alternative parts of speech and their frequencies
line 2: name of the corpus, frequency of the displayed lemma, clicking the frequency displays a concordance

(2) grammatical category heading shows:
the name of the category (if the name of the category is not self-explanatory, click the frequency count to display examples in a concordance which will make it clear)
the total frequency of collocations from this category including those not listed on the screen

(3) frequency of each collocation, clicking the frequency will display a concordance

(4) clicking the plus (+) will display a multi-word sketch, clicking the plus next to leader will display a word sketch for the phrase team leader, the user can configure which words have this feature, see the advanced options below

(5) the phrase grey represents the longest commonest match which shows the most common realisation of the collocational pair helping you to understand the most typical use of the collocation

Result screen - left menu options

The left menu of the word sketch result screen gives these options:

saves the word sketch as txt or XML file

Change options
opens the advanced setting dialogue (described on this page) to change options

will cluster collocates by meaning, collocates similar in meaning will be grouped together

Sort by freq/score
will toggle the way the collocates are sorted: by frequency or by the strength of the collocation (score)

Hide/Show gramrels
show gramrels
collocates are categorized into groups
hide gramrels collocates are displayed as one long list with grammatical relation, frequency and score listed

More data
will load more collocates, the columns will contain more items

Less data
will load fewer collocates, the columns will contain fewer items

Advanced options

The basic options are sufficient for most uses. However, more parameters can be set via advanced options (3).

Word sketch basic options

The complete set of advanced options looks like this:

Word sketch advanced settings Word sketch advanced settings

(1) type a lemma or a phrase (multi-word expression)

(2) select a part of speech
all generates a word sketches for all parts of speech and displays them together on one screen
auto automatically generates a word sketch for the most frequent part of speech with options to switch to the other parts of speech

(3) expands the advanced settings below

(4) lets the user select a subcorpus, (5) displays details about the subcorpus, (6) gives the user the option to create a new subcropus, (7) links to this user guide

(8) sets how frequent the lemma has to be in the corpus to be included in the word sketch, leave to auto to have Sketch Engine decide on the best value

(9) normally this setting does not have to be altered, it is defined as logDice, see Statistics used in Sketch Engine for detailed explanation

(10) maximum number of collocates listed in one column

(11) the collocates can be sorted according to frequency (Raw frequency) or according to the  of collocation strength (Score)

(12) displays a percentage indicating what proportion of the occurrences of the node appeared in a collocation, the rest were examples of the node in non-collocational environment, for example in a one-word sentence consisting of the node only.

(13) will show the longest as well as the most frequent phrase containing the node and the collocate as an example of the collocation

(14) will cluster (=group) collocates similar in meaning

(15) if the cluster collocations option is selected, this setting controls how similar in meaning the collocates must be to include them into the same group, a larger number will groups collocates closer in meaning, smaller number will group even less related collocates

(16) when unchecked, collocates will be placed in one long column rather than grouped by grammatical relation

(17) some word sketches may include statistics not related to word combinations but to word forms, e.g. statistics of nominal cases or statistics of verbs used in passive or as reflexive verbs, these are called unary relations, this setting limits which relations should be included

(18) sets how many times a collocation has to appear in the corpus to display the plus (+) sign to access its multi-word sketch, higher number will produce fewer pluses

(19) adjusts how many columns should appear next to each other on the result screen

(20) sets which grammatical relations will be displayed, to display all relations, leave all ticked or all unticked

(21) switches between all ticked and all unticked

(22) used for generating bilingual word sketch

(23) generates the word sketch

(24) saves theses settings as default settings for each subsequent word sketch generated from this corpus, this is a per-corpus setting, different settings can be saved for each corpus

Grammatical relations

The description of the grammatical relation may not always be self-explanatory and at the same time, it is not easy to provide a full glossary of the relations because they can differ from one language to another. It is strongly recommended that the user clicks the frequency count on the result screen to display the actual example from which the relations can be easily inferred.

(new in version 2.54–2.89)

There is a possibility to see the word sketch for multiword expressions. It is done by filtering word sketch according to particular collocates – e.g. you can show the word sketch for “water” filtered by the occurrences with “hot” as a modifier. If you make a regular word sketch and tick Show links to multiword sketches in the advanced options, you will see little arrows next to each collocation that will lead you to the corresponding filtered word sketch / multiword sketch.

If you enter multiple lemmas into the lemma field, SketchEngine will automatically try to guess the headword and show the filtered word sketch for the headword. Note that not always the automatic choice is optimal – in case you do not see what you wanted or you get a message saying there are no results, it may be worth to try to start from the headword and look up the particular collocates

Statistics behind word sketches

The statistics used in Sketch Engine to calculate word sketches is described in this document.

Referencing word sketches, Bibliography

Detailed Sketch Engine manual

THOMAS, James Edward (2015). Discovering English with Sketch Engine (DESkE), chapter 9 Word Sketches, pp. 161–176.

Work on word sketch

Semantic Word Sketches (presentation). Diana McCarthy, Adam Kilgarriff, Miloš Jakubíček and Siva Reddy (2015). InCorpus Linguistics (CL2015).

Finding Multiwords of More Than Two Words. Adam Kilgarriff, Pavel Rychlý, Vojtěch Kovář and Vít Baisa (2012). In Proceedings of the 15th EURALEX International Congress, Norway, pp. 693–700.

A Quantitative Evaluation of Word Sketches. Adam Kilgarriff, Vojtěch Kovář, Simon Krek, Irena Srdanovic and Carole Tiberius (2010). In Proceedings of the 14th EURALEX International Congress. The Netherlands, pp. 372–379.

Towards disambiguation of word sketches. Vít Baisa (2010). In Text, Speech and Dialogue. Germany, Berlin: Springer-Verlag, pp. 37–42.

Word sketch for individual languages

arTenTen: Arabic Corpus and Word Sketches. Tressy Arts, Yonatan Belinkov, Nizar Habash, Adam Kilgarriff and Vít Suchomel (2014). In Journal of King Saud University – Computer and Information Sciences, volume 26, issue 4, pp. 381–395.

Hindi Word Sketches. Anil Krishna Eragani, Varun Kuchibhotla, Dipti Sharma, Siva Reddy and Adam Kilgarriff (2014). In Proceedings of the Conference on Natural Language Processing (ICON-11), Goa, India.

Word Sketches for Turkish. Bharat Ram Ambati, Siva Reddy and Adam Kilgarriff (2012). In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), Turkey, pp. 2945–2950.

Vietnamese Word Sketches. Adam Kilgarriff and Phuong Le-Hong (2012). In Workshop on Vietnamese Language and Speech Processing (IEEE-RIVF 9), Vietnam, pp. 1–4.

Polish Word Sketches. Adam Radziszewski, Adam Kilgarriff and Robert Lew (2011). In Proceedings of the 5th Language & Technology Conference (LTC), Poland, pp. 237–242.

Japanese Word Sketches: Advances and Problems. Irena Srdanović, Naomi Ida, Chikako Shigemori Bučar, Adam Kilgarriff and Vojtěch Kovář (2011). In Acta Linguistica Asiatica, University of Ljubljana, Slovenia, pp. 63–82.

Studying Word Sketches for Russian. Maria Khokhlova and Victor Zakharov (2010). In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’12) Malta, pp. 3491–3494.

Building Russian Word Sketches as Models of Phrases. Maria Khokhlova (2010). In Proceedings of the 14th EURALEX International Congress. The Netherlands, pp. 364–371.

The RoWaC Corpus and Romanian Word Sketches. Monica Macoveiciuc and Adam Kilgarriff (2010). In Multilinguality and Interoperability in Language Processing with Emphasis on Romanian Edited by Dan Tufis and Corina Forascu. Romanian Academy, pp. 151–168.

Slovene Word Sketches. Simon Krek and Adam Kilgarriff (2006). In Proceedings 5th Slovenian/First International Languages Technology Conference, Slovenia.

Chinese Word Sketches. Adam Kilgarriff, Chu-Ren Huang, Pavel Rychlý, Simon Smith and David Tugwell (2005). In Proc. Asialex, Singapore.

Manatee, Bonito and Word Sketches for Czech (abstract in Russian). Pavel Rychlý and Pavel Smrž (2004). In Proceedings of the Second International Conference on Corpus Linguistics. Saint-Petersburg: Saint-Petersburg State University Press, pp. 124–132.