Use Sketch Engine in minutes

Generating collocations, frequency lists, examples in contexts, n‑grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.

Sketch Engine is also suitable for comparing corpora. Users can compare preloaded corpora as well as their own corpora that were compiled. The result of comparing corpora is a comparison chart.

How to compare corpora

Compare corpora in four simple steps.

(1) click Compare corpora in the Main menu

(2) select language and set attribute for the comparison

(3) select by ticking two or more corpora to comparing

(4) see the result in the comparison chart

Characteristics of the result

  • value “1” means identical corpora
  • the higher score & the darker color, the greater difference between corpora (“4” does not mean twice as many as “2”)
  • the scores are clickable and connected to the relevant word list page of two selected corpora and attribute

How does it work?

Process of comparing corpora

– for every two corpora
– top 5000 words according to frequency (from every corpus separately),
– for every word from unification to count keyword score
– next only top 500 words according to score
– arithmetic mean of their score is a similarity pair of corpora

Another possibility to compare two corpora

The second way of comparing corpora is via the Word list feature which enables to compare two corpora (or their subcorpora) and set significance of rare/common words.

A comparison chart for English corpora

The picture shows a comparison of various English corpora. The scores in the table stand for corpus similarity when 1 is for identical corpora and the bigger the score (and the darker the grey), the greater the difference between two corpora. The corpora written on lines are compared corpora, in columns, there are reference corpora

Bibliography

Kilgarriff, A. (2001). Comparing corporaInternational journal of corpus linguistics6(1), 97-133.