Entries by Michal Cukr

Bigger ACL Anthology Reference Corpus

The ACL Anthology Reference Corpus made up of papers from Digital Archive of Research Papers in Computational Linguistics is now almost twice as large. The corpus is freely accessible even without a Sketch Engine account.

Automatic thesaurus

By definition, a thesaurus (plural thesauri, pronounced [-rai]) is a type of dictionary which lists synonyms or words from the same semantic category, e.g. animals, furniture etc.

Lemmatization & tagging for Greek

We continue improving tools for processing languages. Greek corpora now have lemmatization and part-of-speech tagging available and they are tokenized better.

Better Danish

We have improved tools for processing Danish. Danish corpora are lemmatized, part-of-speech tagged and tokenized better.

Bigger and up-to-date Timestamped JSI web corpora

The latest data until  September 2017 have been added to The Timestamped JSI web corpora. Data in all 18 languages are updated with new data monthly.

Discover trending words in English newspapers

Find out how the writing in the main English newspapers has changed over the past two decades. Use diachronic analysis in the SiBol: English Broadsheet Newspapers 1993–2013 corpus.

New academic English corpus

A new corpus of academic English is now available in Sketch Engine. The corpus was collected from the database of open access journals, the Directory of Open Access Journals (DOAJ), and is comprised of 2.6 billion words.

New corpus from the environment domain

The LexiCon Research Group at the University of Granada developed and provided their highly specialised English EcoLexicon corpus built up of environmental texts. The corpus is hosted as an open corpus and is freely accessible even without a Sketch Engine account.

Extended corpus of English broadsheets

We have doubled the size of the SiBol corpus, a 650-million-word collection of English Broadsheet Newspapers 1993–2013 documenting the language of English journalism.

Brexit Corpus

A new Brexit corpus has been added to Sketch Engine. It is a collection of texts about the UK referendum on the withdrawal the United Kingdom from the European Union.

An update to Discovering English with Sketch Engine 2nd edition

Discovering English with Sketch Engine by James Thomas has had an update.

New multilingual resource in 24 languages is available
Discovering English with Sketch Engine, 2nd edition
Update of Service Level Agreement