Compare and contrast words visually

Word sketch difference is used to compare and contrast two words by analysing their collocations and by displaying the collocates divided into categories based on grammatical relations.

Why use the Word Sketch?

The Word Sketch is a real timesaver, negating the need for reading thousands of concordance lines to draw conclusions. All information is readily available from one screen only.

In the screenshot below, Sketch Engine assigned the green colour to clever and the red colour to intelligent.  Green collocates are more closely related to clever, red collocates to intelligent. Stronger colour indicates stronger collocations, e.g. it is more usual to say clever trick than intelligent trick, on the other hand, it is more natural to say intelligent robot than clever robot.

ws_sketch_diff

How to generate a Word Sketch Difference?

To generate a Word Sketch, access the Word Sketch screen from the left menu (1):

Compare collocations of two words based on text corpus data

(2) type the first lemma

(3) select the part of speech

Sketch diff by

lemma (4) type the second lemma

subcorpus (5) use this option to compare the behaviour of the same word in two different subcorpora, e.g. written or spoken language or academic and general language. Use the links to create new subcorpora. To use this option, only type one lemma in (2) and select the two subcorpora to compare.

word form (6) use this option to compare two-word forms of the same lemma; first type the lemma (2) and then the two-word forms

(7) click Show diff to display the results

Word Sketch Difference in detail

Result screen description

The result screen looks like this:

Word sketch difference result screen

(1) the header shows the name of the corpus and the frequency of each lemma

(2) the colour bar shows how the scores translate into colour

(3) name of the grammatical category, if the name is not clear, click one of the frequency counts (4) or (5) to see examples
(4) frequency count for the combination with the first lemma, i.e. funny and/or clever

(5) frequency count for the combination with the second lemma, i.e. funny and/or intelligent

(6) [used for development purposes only]

(7) [used for development purposes only]

Result screen - left menu options

The left menu of the word sketch result screen gives these options:

Change options

opens the advanced setting dialogue (described on this page) to change options

Advanced options

The advanced options are used to change the way the results are presented. The complete set of advanced options looks like this:

Compare collocations with advanced options

(1)
all in one block – the result screen will be displayed as in the screenshot above with red, white and green words for each grammatical relation in the same column

common/exclusive blocks – the result screen will first display the white words in categories, followed by the green words in categories followed by the red words in categories

(2) set a minimum frequency for the collocation to be included in the Word Sketch Difference, leave set to auto for Sketch Engine to decide automatically

(3) sets the number of items when collocates are displayed in one block (bigger number means longer lists with more items)

(4) sets the number of items displayed when collocates are dispalyed in exclusive blocks (biger number means longer lists with more items)

Statistics behind word sketches

The statistics used in Sketch Engine to calculate word sketches is described in this document.

Referencing word sketches, Bibliography

Detailed Sketch Engine manual

THOMAS, James Edward (2015). Discovering English with Sketch Engine (DESkE), chapter 9 Word Sketches, pp. 161–176.

Work on word sketch

Semantic Word Sketches (presentation). Diana McCarthy, Adam Kilgarriff, Miloš Jakubíček and Siva Reddy (2015). InCorpus Linguistics (CL2015).

Finding Multiwords of More Than Two Words. Adam Kilgarriff, Pavel Rychlý, Vojtěch Kovář and Vít Baisa (2012). In Proceedings of the 15th EURALEX International Congress, Norway, pp. 693–700.

A Quantitative Evaluation of Word Sketches. Adam Kilgarriff, Vojtěch Kovář, Simon Krek, Irena Srdanovic and Carole Tiberius (2010). In Proceedings of the 14th EURALEX International Congress. The Netherlands, pp. 372–379.

Towards disambiguation of word sketches. Vít Baisa (2010). In Text, Speech and Dialogue. Germany, Berlin: Springer-Verlag, pp. 37–42.

Word sketch for individual languages

arTenTen: Arabic Corpus and Word Sketches. Tressy Arts, Yonatan Belinkov, Nizar Habash, Adam Kilgarriff and Vít Suchomel (2014). In Journal of King Saud University – Computer and Information Sciences, volume 26, issue 4, pp. 381–395.

Hindi Word Sketches. Anil Krishna Eragani, Varun Kuchibhotla, Dipti Sharma, Siva Reddy and Adam Kilgarriff (2014). In Proceedings of the Conference on Natural Language Processing (ICON-11), Goa, India.

Word Sketches for Turkish. Bharat Ram Ambati, Siva Reddy and Adam Kilgarriff (2012). In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), Turkey, pp. 2945–2950.

Vietnamese Word Sketches. Adam Kilgarriff and Phuong Le-Hong (2012). In Workshop on Vietnamese Language and Speech Processing (IEEE-RIVF 9), Vietnam, pp. 1–4.

Polish Word Sketches. Adam Radziszewski, Adam Kilgarriff and Robert Lew (2011). In Proceedings of the 5th Language & Technology Conference (LTC), Poland, pp. 237–242.

Japanese Word Sketches: Advances and Problems. Irena Srdanović, Naomi Ida, Chikako Shigemori Bučar, Adam Kilgarriff and Vojtěch Kovář (2011). In Acta Linguistica Asiatica, University of Ljubljana, Slovenia, pp. 63–82.

Studying Word Sketches for Russian. Maria Khokhlova and Victor Zakharov (2010). In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’12) Malta, pp. 3491–3494.

Building Russian Word Sketches as Models of Phrases. Maria Khokhlova (2010). In Proceedings of the 14th EURALEX International Congress. The Netherlands, pp. 364–371.

The RoWaC Corpus and Romanian Word Sketches. Monica Macoveiciuc and Adam Kilgarriff (2010). In Multilinguality and Interoperability in Language Processing with Emphasis on Romanian Edited by Dan Tufis and Corina Forascu. Romanian Academy, pp. 151–168.

Slovene Word Sketches. Simon Krek and Adam Kilgarriff (2006). In Proceedings 5th Slovenian/First International Languages Technology Conference, Slovenia.

Chinese Word Sketches. Adam Kilgarriff, Chu-Ren Huang, Pavel Rychlý, Simon Smith and David Tugwell (2005). In Proc. Asialex, Singapore.

Manatee, Bonito and Word Sketches for Czech (abstract in Russian). Pavel Rychlý and Pavel Smrž (2004). In Proceedings of the Second International Conference on Corpus Linguistics. Saint-Petersburg: Saint-Petersburg State University Press, pp. 124–132.