To keep Sketch Engine in good shape, we must carry out occasional thorough maintenance. Sketch Engine will stay available but with certain limitations. We are sorry for any inconvenience this might cause.
Author Archive for: michal
About Michal Cukr
This author has yet to write their bio.Meanwhile lets just say that we are proud Michal Cukr contributed a whooping 45 entries.
Entries by Michal Cukr
Are you ready for April Fools’ day this year? How about April Fools’ Day CQL? Download the April page from our calendar with an example of a punctuation search. The example does work! No joking. Sketch Engine is a serious tool after all.
After improving tools for processing Danish, we are coming with a new version of the Danish Corpus from the web. Texts in this 2-billion-word Danish corpus were downloaded in December 2017.
We are pleased to inform you that a list of Sketch Engine corpora has been extended by adding a new Belarusian corpus, the 63-million-word corpus of texts collected from the web.
Similarly to last year, we make the Sketch Engine calendar with useful CQL examples available online. Please download the page for March with handy examples of using an optional character and repetitions.
Find more and better collocations in French. We have improved our collocation search (the word sketch feature) identifying automatically collocations and patterns specific to French.
A new 25-million-word Amharic corpus has been added to Sketch Engine.
Term extraction or terminology extraction is an automatic method of analysing text in order to identify phrases which fulfil the criteria for terms. Terminology extraction has its use in translation and terminology management but also in text analytics where it is used for topic modelling, data mining and information retrieval from unstructured text.
Sketch Engine can now find more and better collocations in Italian. The collocation search (the word sketch feature) identifies automatically collocations and patterns specific for Italian.
The ACL Anthology Reference Corpus made up of papers from Digital Archive of Research Papers in Computational Linguistics is now almost twice as large. The corpus is freely accessible even without a Sketch Engine account.
By definition, a thesaurus (plural thesauri, pronounced [-rai]) is a type of dictionary which lists synonyms or words from the same semantic category, e.g. animals, furniture etc.
We continue improving tools for processing languages. Greek corpora now have lemmatization and part-of-speech tagging available and they are tokenized better.
We have improved tools for processing Danish. Danish corpora are lemmatized, part-of-speech tagged and tokenized better.
The latest data until September 2017 have been added to The Timestamped JSI web corpora. Data in all 18 languages are updated with new data monthly.
Find out how the writing in the main English newspapers has changed over the past two decades. Use diachronic analysis in the SiBol: English Broadsheet Newspapers 1993–2013 corpus.