search in huge text corpora

learn how words are used

compare and contrast words visually

upload and search your own corpora

build specialised corpora instantly from the Web

extract wordlists, keywords, terms and thesauri

explore distributional thesaurus with word clouds

The Sketch Engine

The Sketch Engine is for anyone wanting to research how words behave. It is a Corpus Query System. It lets you see a concordance for any word, phrase or grammatical construction, in one of the corpora that we provide, or in a corpus of your own. Its unique feature are word sketches, one-page, automatic, corpus-derived summaries of a word's grammatical and collocational behaviour.

See languages overview

Arabic, Bulgarian, Cantonese, Catalan, Chinese, Czech, Danish, Dutch, English (American), English (British), Estonian, Finnish, French, Galician, German, Greek, Gujarati, Hebrew, Hindi, Indonesian, Italian, Japanese, Kannada, Korean, Latin, Latvian, Lithuanian, Malay, Malayalam, Maltese, Marathi, Norwegian, Persian, Polish, Portuguese (Brazilian), Portuguese (European), Romanian, Russian, Serbian, Setswana, Slovak, Slovene, Spanish (Americas), Spanish (European), Swahili, Swedish, Tamil, Telugu, Thai, Turkish, Vietnamese, Welsh

See all features

Concordance search
Lookup words in context with complex queries
Word lists retrieval
Acquire large-scale lexicons
Collocation extraction
See most common word collocates
Word sketches
Learn how words are used
Word sketch difference
Compare and contrast words visually
Corpus Architect
Create and use your own corpora
Create specialised corpora from Web instantly
Keyword and Term Extraction
Discover topics and build glossaries automatically
Distributional Thesaurus
Find the right expressions for your aims
Parallel Corpora
Learn from existing translations
Corpus Comparison
Evaluate and assess large text corpora

See all (200+) text corpora

British National Corpus (BNC)
ACL Anthology Reference Corpus (ARC)FREE ACCESS
British Academic Written English Corpus (BAWE)FREE ACCESS
British Academic Spoken English Corpus (BASE)FREE ACCESS
Susanne Corpus
TenTen Web Corpora
Arabic, Bulgarian, Chinese, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Hungarian, Italian, Japanese, Korean, Polish, Portuguese, Russian, Slovak, Spanish, Swedish
Web as Corpus (WaC) Web Corpora
Basque, Bengali, Bosnian, Croatian, Chinese, Danish, English, Filipino, Finnish, French, Frisian, Georgian, German, Greek, Hebrew, Hindi, Igbo, Indonesian, Italian, Japanese, Korean, Malay, Malayam, Maori, Samoan, Serbian, Swahili, Swedish, Tamil, Telugu, Thai, Turkish, Vietnamese, Welsh
EUROPARL Parallel Corpora
OPUS Parallel Corpora

Users and Uses

The Sketch Engine is used to write dictionaries by