Sketch Engine is a corpus manager and analysis software has developed by Lexical Computing since 2003. This software consists of three main components which enable to search and build text corpora.

  • Bonito – a graphical user interface to corpora maintained, see the changelog of Bonito
  • Manatee – a corpus management tool including corpus building and indexing, fast querying and providing basic statistical measures, see the changelog of Manatee
  • FinLib – fast indexing library

A brief overview of main changes in FinLib is listed here.

Current stable version: 2.36.5

2.36.5

  • fix regex queries which contain multibyte character prefixes

2.36.4

  • fix issue with large memory-mapped files

2.36.3

  • fix invalid memory access in int_ranges::num_at_pos()

2.36.2

  • fix invalid memory access in part_range::find_end()

2.36.1

  • fix rare issue with large reverse indices for n-grams

2.36

  • added dumpbits for dumping delta and gamma encoded files

2.35.2

  • GCC 6 compatibility

2.35.1

  • update API version

2.35

2016/04/28

  • added write_lexicon::pop_added_load()
  • write_lexicon::{pop_cache_miss_ratio,avg_str_size} moved to .cc

2.34

2016/01/23

  • write_lexicon exports cache sizes, avg item size and cache miss ratio
  • TextConsumer exports its type via get_type()

2.33

2015/08/31

  • added fix lexovf for computing .lex.ovf file for a lexicon
  • support for lexicon file size over 4 GB (2^32 bytes)

2.32

2015/05/01

  • optimize “simple OR” CQL queries
  • added ArrayGenerator class
  • added factory method QOrVNode::create()

2.31

2015/04/01

  • write_lexicon allows overwriting datafiles

2.30.4

2015/03/02

  • regexp metachars checking handles escaping by backslash

2.30

2015/01/18

  • added mklex for creating lexicons

2.29

2014/09/21

  • added Fast2Gen: FastStream to Generator adapter

2.28.3

2014/06/24

  • finlib reserves file descriptors for joined set of revs

2.28.2

2014/03/18

  • various CQL evaluation fixes

2.27

2014/01/12

  • faster regexp evaluation for patterns matching large portions of lexicon

2.26

2013/12/27

  • faster evaluation of (.*)+ queries

2.25

2013/09/29

  • mkdtext support for storing structure attributes text
  • faster delta stream reading by using assembler builtins

2.24

2013/06/06

  • FIX: labels propagation in some CQL queries

2.23

2013/05/26

  • API/ABI changes: rebuilding Manatee is required
  • faster reading of a number of index files (ca. by 5 %)
  • FIX: critical bugfixes in reading a number of index files

2.22.2

2013/04/02

  • API/ABI changes: rebuilding Manatee is required
  • FIX: critical bugfixes in reading a number of index files

2.22.1

2013/03/07

  • API/ABI changes: rebuilding Manatee is required
  • FIX: critical bugfixes in reading a number of index files

2.22

2013/02/24

  • API/ABI changes: rebuilding Manatee is required

2.21.1

2013/02/18

  • API/ABI changes: rebuilding Manatee is required
  • FIX: critical bugfixes in reading a number of index files

2.21

2013/01/29

  • FIX: query evaluation
  • backward incompatible API/ABI changes

2.20.3

2013/01/18

  • API/ABI changes: rebuilding Manatee is required
  • FIX: critical bugfixes in reading a number of index files

2.20.2

2013/01/08

  • faster joining of reverse indices (ca. by 50 %)
  • API/ABI changes: rebuilding Manatee is required
  • FIX: critical bugfixes in reading a number of index files

2.20.1

2012/11/29

  • FIX: critical bugfixes in reading a number of index files

2.20

2012/09/04

  • allow backreferences in CQL regular expressions (if compiled without PCRE, the backreferences start with 2, because the pattern is surrounded into (…)$)

Search text corpora with Sketch Engine

Sketch Engine offers a range of tools to work with text corpora in 90+ languages.

or

Text corpora in Sketch Engine

Sketch Engine offers 350+ language corpora.

Use Sketch Engine in minutes

Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.