Sketch Engine changelog - Manatee

Sketch Engine is a corpus manager and analysis software has developed by Lexical Computing since 2003. This software consists of three main components which enable to search and build text corpora.

Bonito – a graphical user interface to corpora maintained, see the changelog of Bonito
Manatee – a corpus management tool including corpus building and indexing, fast querying and providing basic statistical measures
FinLib – fast indexing library, see the changelog of FinLib

A brief overview of main changes in Manatee is listed here.

Current stable version: 2.151.6

2.152.1

do not parallelize corpus operations by default

2.152

implement parallel corpus indexing
improve parallel word sketch handling

2.151.5

fix Concordance::delete_subparts()
virtual corpora fixes

2.151.4

update mklcm

2.151.3

ensure that corpus PATH is nonempty
decodevert: structure attribute values escaping
regexopt: fix support for bracket literals
compilecorp: use one processor by default

2.151.2

fix queries containing ‘containing’

2.151.1

cql: support {,N} and {N,} quantifiers
remove skip_dupctx parameter for KWICLines

2.151

implement skip_dupctx parameter for KWICLines

2.150.4

remove C++11 features

2.150.3

fix a few memory leaks

2.150.2

quality improvements

2.150.1

do not virtualize sketches when some segments are not complete corpora

2.150

genngr: skip over default and empty attribute values
mksubc: urlencode names of subcorpora

2.149.3

quality improvements

2.149.2

quality improvements

2.149.1

cql: do not generate errors that are not valid utf-8

2.149

corpconf: remove support for escape sequences

2.148

corpconf: restrict support for escape sequences
cql: allow @ in attribute names

2.147

corpconf: only support escapes in double-quoted strings

2.146.6

corpconf: implement escapes in string literals
cql: fix sketch queries
regex optimization: fix the behavior of ‘+’

2.146.5

cql: enable NoSketchEngine support

2.146.4

fix FilteredWMap::poss() skipping duplicate positions

2.146.3

fix for large concordances and WMaps

2.146.2

cql: support large parameters to ws() and thes()

2.146.1

various regex optimization fixes

2.146

support zero-element word sketch files

2.145

cql: report error position
genws: support MULTIVALUE for collocations
fix ENCODING for structure attributes

2.144.1

cql: fix ONEPOS queries

2.144

update regex optimization rules
speed up corpquery -n

2.143.4

2016/12/12

cql: support for multilevel wmap seek

2.142

2016/11/23

cql: parse ‘seek’ in ‘ws|term(level, seek)’ as a number
add NEWS for manatee to shut up autotools
manatee: implemented query evaluation in yacc

2.141

2016/11/03

corpquery accepts subcorpus via -u
added default locale li_NL for Limburgish
FinLib 2.36.2

2.140.2

2016/10/21

extrms simple math N parameter can be float
finlib 2.36.1

2.140.1

2016/10/19

decodevert: print end structures in reverse order

2.140

2016/10/13

encodevert: check minimum bucket size for attribute memory
FinLib 2.36

2.139.3

2016/08/27

wm2thes: accept CORPNAME argument also without -m
compilecorp: use virtws for virtual corpora sketches

2.138.4

2016/08/13

compilecorp: use mklcm-go
biterms: made ca 4x faster

2.138.3

2016/08/11

biterms: use new WMap interface

2.137.3

2016/07/14

added multiword thesaurus computation
reformat wm2thes.cc
implemented virtual sketches, updated interface to WMap
added virtws for compilation of sketches on virtual corpora
added WMap::seppage() to export SEPARATEPAGE number
mkalign: print line number on alingdef file format error

2.137

2016/05/20

mktrends: allow the SUBCORP argument to be empty

compilecorp: ALIGNDEF supports pipes like VERTICAL does
faster mktrends
manatee: mklcm in go
compilecorp: support for WSOLDSCORES

2016/03/31

encodevert: call mknormattr according to MAPTO directive
added support for normalization attribute
ANTLR CQL grammar supports description definition

2.135.5

2016/02/28

tstquery: added queries on parallel corpora
tstquery: print executed queries
do not label aligned corpus query in WITHIN!/!WITHIN queries

2.135.4

2016/02/21

compilecorp: always move logfile into corpus path directory
compilecorp: improved error reporting to indicate actual lines numbers

2.135

2016/01/30

encodevert: better manipulation with lexicon added items cache

2.134

2016/01/20

encodevert: dynamic lexicons cache sizes
reformat mkwmrank.cc
added bgr_abs_freq_coll association score
- returns frequency of the first word of the collocation pair

2.133.4

2015/12/12

mktrends: finalize output files properly

2.133.3

2015/12/10

corpcheck: tolerate local path in INFOHREF

2.133.3

2015/12/10

mktrends: finalize output files properly

2.133.2

2015/12/07

fix handling of aligned corpora labels in Concordance

2.133.1

2015/12/03

KWICLines skip aligned corpora collocations

2.133

2015/12/02

CQL: added support to term queries using term() operator
compilecorp: added –no-ske option being default for NoSkE

2.132.1

2015/11/30

tstregexopt: takes attribute as another optional argument

2.132

2015/11/24

speed up RQinNode and RQcontainNode

2.131.3

2015/11/24

mknorms: speed up computation for subcorpora

2.131

2015/11/12

removed findPosAttr() functions
reformat corpinfo.cc

2.130.6

2015/11/12

fix !WITHIN

2.130.5

2015/11/08

compilecorp: call mktrends with EPOCH_LIMIT being 1
fix MAXKWIC being 0 not meaning unlimited MAXKWIC

2.130.3

2015/11/04

mktrends, save subcorp data properly

2.130.2

2015/10/31

added NonEmptyRS for filtering empty RangeStream ranges

2.130

2015/10/25

KWICLines has new method is_defined() and short-circuits processing of undefined lines
added Concordance::filter_aligned() for filtering by aligned corpus

2.129

2015/09/21

mktrends: speed up ca 15x by more usage of numpy

2.128.4

2015/09/10

updated CQL testsuite with current WS results on susanne

2.127

2015/08/04

compilecorp: added support for longest commonest match

2.126

2015/07/28

compilecorp: added support for trends computations
added mktrends script prepared by Ondřej Herman

2.125.2

2015/07/20

mkwmrank: computing scores for each gramrel is independent of other gramrels

2.124

2015/05/02

concordance automatically detects all collocations

2015/04/19

CQL supports general NOT (!) in sequences as complement operator

Bugfixes:

fix CQL inequality comparisons on dynamic attributes

2.121.2

2015/04/08

disable MULTIVALUE freqdist for positional attributes

2.121

2015/04/03

mkdynattr: no need to manually delete lexicon with new write_lexicon
added new DYNTYPE “freq” for dynamic attributes
compilecorp and parws pass WSMINHITS to mkwmap
mkwmap: added all options to usage
mkwmap: added -f option allowing filtering for minimum frequency
write_lexicon allows overwriting datafiles
compilecorp: hashws terms automatically
compilecorp: write manatee version to log

Bugfixes

fix empty KWICLines structure context for empty KWIC

2.120.1

2015/03/29

Bugfixes:

genws: fix SEPARATEPAGE index for grammars using DUAL

2.120

2015/03/28

freqs: allow filtering by subcorpus
new freq_dist() attribute modifier “/n” for getting IDs intead of string

Bugfixes:

fix regexp2ids/regexp2poss for patterns with escaped metacharacters
compilecorp: ‘skipping biterms’ message fixed

2.119

2015/03/23

genngr: allow setting min and max n-gram length from cmdline
genngr: limit maximum n-gram length to 30 by default

2.118

2015/03/22

Bugfixes:

fix build with gcc 4.4 (RHEL/CentOS 6)
fix ConcStream::find_beg()/find_end()

2.117

2015/02/24

create_subcorpus() takes an optional Structure argument

2.116

2015/02/23

dumpalign supports 1:1

2.115.3

2015/02/23

mkwmrank: fix segfault when datafiles cannot be open
updated package specfiles to contain lsalsize

2.115.2

2015/02/10

updated tstquery gold results after word sketch format change
compilecorp: compute sizes after alignment
added lsalsize binary for listing alignment size of two corpora
mksizes: use lsalsize to compute alignment size

Bugfixes:

fix showing GDEX scores when references are up
Fix GDEX score display in concordance view
manatee: fix installing binaries on DEB
corpquery: fix parallel queries garbled by fake collocates

2.115.1

2014/02/10

manatee: script for bilingual term extraction

2.115

2014/01/21

CorpInfo may be modified and is exported into SWIG API
added dumpalign script for dumping parallel corpora

2.114

2015/01/18

CQL supports regular expressions in word sketch gramrels
added regexp2ids() for word sketch gramrels
added mklex for creating lexicons

2.113

2015/01/14

mkwmrank: added parameter for commonest match input
WSATTR defaults to lempos_lc -> lempos -> lemma_lc -> lemma -> DEFAULTATTR

2.111.8

2014/11/23

updated tstquery gold results after word sketch format change

Bugfixes:

genws: fix handling invalid STRUCTLIMIT

2.111.6

2014/11/17

mkwmap works with empty input

Bugfixes:

skell: fixed typo in jQuery

2.111.3

2014/10/21

2x faster commonest_match.py

2.110

2014/09/21

added defaults for SIMPLEQUERY corpus directive; it is [A=”%s” | B=”%s”]
CQL supports different attributes in global conditions
CQL supports !within and !containing operators
genws: STRUCTLIMIT may be arbitrary CQL query
added mkregexattr for compiling regex dynamic attribute
new version of word sketch data files

2.110

2014/08/25

added jQuery UI javascript, css and images
added create_subcorpus() for arbitrary CQL query
create_subcorpus() takes directly RangeStream instead of query
mksubc supports creating subcorpora from CQL query

Bugfixes:

fix parws lexicon verification for new style TRINARY templates

2.109.8

2014/08/13 Bugfixes:

fix build with gcc 4.4

2.109.7

2014/07/28

parws: use single batch for TRINARY and COLLOC gramrels
compilecorp honours TMPDIR environment variable

Bugfixes:

mkvirt: fix freqs computation overflowing at int size
genngr: fix maximum allowed corpus size to 2^31-2

2.109.6

2014/07/01

genws: set COLLOC lexicon hash size to 500k items

printer icon shall be part of NoSkE

Bugfixes:

corpquery: fix marking KWIC in output

2.109.2

2014/06/18

compilecorp does not assume “word” attribute existence
corpquery does not assume “word” attribute

2.109

2014/06/16

MAXKWIC restriction placed into Concordance

Bugfixes:

fixed a bug in selecting gramrels

2.108

2014/06/13

added new dynamic function ascii for transliteration

mkwmap reserves file descriptors for joined set of files
Corpcheck checks if file “sizes” exists in PATH
changed support mail

2.107

2014/04/16

compilecorp support for bilingual dictionaries
added MAXKWIC size for KWICLines, defaults to 100

2.106

2014/02/27

added corpcheck utility for checking corpora sanity
added wsdump script for dumping of word sketches

2.103

2014/02/09

added sconll2sketch and sconll2wmap
compilecorp support for sketches from (S)CONLL

2.97

2013/12/28

mkdynattr: fix dynamic structure attributes of virtual corpora
mkstats support for n-grams on subcorpora

2.96

2013/11/10

added dumpthes — simple dumping of thesaurus
CQL support for similarity search in thesaurus

2.95

2013/11/03

added new dynamic function utf8capital
added new dynamic function utf8uppercase

2.94

2013/11/01

added new dynamic function getnbysep
fix mkvirt failing if virtdef contains single corpus

2.92

2013/10/23

encodevert compiles dynamic structure attributes
support for complement subcorpora

2.87

2013/09/29

faster implementation of frq and docf computation
choose first non-dynamic attribute as default DEFAULTATTR
mkvirt accepts attribute list via -a option
added devirt script for corpus devirtualization
added parencodevert script for parallel corpus encoding
redesign of mksubc and (sub)corpora statistics creation
corpus configuration file may not end with a new line
faster computation of ARF + ALDF

2.86

2013/08/14

full support for atributes of structures in virtual corpora
genws reports progress with -p option

2.85

2013/08/07

fix segfault when opening a virtual corpus with unavailable virtdef
mkvirt automatically creates dynamic attributes
virtdef file may contain ‘$’ for segment end being corpus end position
fix corpinfo so that it dumps valid configuration file format
added mksizes script for compiling sizes
compilecorp support for creating word sketch hashes

2.84

2013/06/06

compilecorp accepts –parallel=N option (number of parallel jobs)
compilecorp support for virtual corpora
mksubc writes detailed progress only with –debug
added CQL for range of positions, e.g. #20-50
CQL frequency function accepts values over 2³¹
implemented CQL for word sketch seeks
added CQL support for querying word sketches by triples
CQL supports new positional functions “swap” and “ccoll”

2.83.3

2013/06/05

FIX: fix missing throw statements for create_subcorpus() in SWIG API
FIX: fix evaluating empty concordance collocation

2.83.2

2013/05/26

FIX: fix SEPARATEPAGE name being trimmed on first white space
FIX: Fix mksubc compiling only the 1st subc in subcdef

2.83.1

2013/05/10

FIX: collocation computation for window crossing beg/end of corpus

2.83

2013/05/10

enable multiple subsequent shuffling

2.82

2013/04/20

mksubc support for n-grams, may take .subc file, may take attribute list

2.81

2013/04/12

added url2domain dynamic attribute

2.80.1

2013/04/03

FIX: utf8_tolower failing for empty strings and unallocated buffer

2.80

2013/04/02

faster sample generation
ngrsave supports encoded corpus as input

2.79

2013/03/21

added utf8getlastn() dynamic attribute function
FIX: SEPARATEPAGE with DUAL TRINARY

2.78

2013/03/07

Concordance exports corpus object into SWIG API

2.77

2013/03/06

lscbr and ngrsave are more user friendly

2.76.1

2013/02/27

FIX: bulding with gcc >= 4.7

2.76

2013/02/26

added support for structures in virtual corpora

2.75

2013/02/24

Frequency distribution does not need Concordance to be computed

2.74

2013/02/18

support DUAL TRINARY word sketch grammatical relations
added getfirstbysep internal function for dynamic attributes
added Setswana locale settings
added dumpwmrev for dumping ws delta rev files

2.73

2013/02/04

requires finlib 2.21

implemented exact KWIC matching in filtering

2.72

2013/01/29

support for aligned segment contexts

2.71

2013/01/11

genhist enhancements

2.70

2013/01/08

compilecorp compiles subcorpora right after the main corpus

2.69

2012/12/10

export Corpus::get_confpath() into SWIG API

2.68

2012/11/29

parallel corpora API modifications
FIX: a number of fixes for processing parallel corpora

2.67.2

2012/11/26

FIX: a number of fixes for processing parallel corpora

2.67.1

2012/11/26

FIX: set default ALIGNSTRUCT to “align”

2.67

2012/11/17

compilecorp compiles alignment for parallel corpora
added a number of helper scripts for processing alignment
FIX: a number of fixes for processing parallel corpora

2.66

2012/11/15

updated licensing information
FIX: a number of fixes for processing parallel corpora

2.65

2012/11/09

enhanced support for processing of parallel corpora
FIX: sync() concordances if necessary before next operations

2.64

2012/11/08

NGram API changes
FIX: genngr failing to process corpora over 2G

2.63

2012/08/31

FIX: estimating word sketch multiword collocations positions

2.62.1

2012/08/30

FIX: allow LEXICONSIZE to increase memory usage

2.62

2012/08/27

encodevert accepts -d to prevent compiling dynamic attributes
FIX: filling default value for attributes of TYPE “UNIQUE”
FIX: mkdynattr takes LEXICONSIZE from corpus configuration

2.61

2012/08/17

support for asynchronous multi-threaded concordance computations
FIX: setting default attribute when querying parallel corpora

2.60.1

2012/07/18

FIX: fix race conditions in parallel computation of sketches with *TRINARY gramrels involved
parws can check gramrel lexicon consistency

2.60

2012/07/10

support labels in the second argument (right-hand side) of within/containing, e.g. ( containing 1:[] 2:[]) & 1.tag=2.tag
FIX: build with ruby 1.9

2.59.1

2012/10/24

bugfix release for the stable branch
FIX: build with ruby 1.9
parws can check gramrel lexicon consistency
FIX: fix race conditions in parallel computation of sketches with *TRINARY gramrels involved
FIX: fix filling default value in unique attribute
parws supports Python >= 2.4
documentation included in the distribution tarball
FIX: CQL: fix default attr setting for parallel corpus
FIX: fix static build with finlib
FIX: fix overflow on appending to a .text file larger than 4 (2^{32) GB}
FIX: finlib: fix build with gcc down to 4.1.2 at least

2.59

2012/06/29

new internal function for dynamic attributes “getlastn” for extracting last n characters
WMap support for access to the dictionary created by *COLLOC directives

2.58

2012/06/25

compatibility with ANTLR 3.4 C runtime
hashws support for subcorpora
more verbose output of encodevert by default
FIX: closing structures at the end of compilation

2.57

2012/06/08

WMAP support for collocation index operations incl. COLLOC directives

2.56

2012/06/06

added fixcorp script for fixing corrupted indices
support for extracting terms lexicon of word sketches

2.55

2012/05/29

support filtering multiword sketches by gramrels

2.54.1

2012/04/30

FIX: minor fixes for nested structures

2.54

2012/04/20

faster evaluation of non-regex matching using == and !== operators
FIX: utf8 lowercasing may have failed under specific circumstances
FIX: dynamic attributes are cleared before recompilation

2.53

2012/04/16

enhanced frequency distribution of nested structures

2.52

2012/04/05

maximum allowed nested structures set to 100

2.51

2012/03/14

requires finlib >= 2.17

support for handling of unique attributes

2.50

2012/03/05

requires finlib >= 2.16

first support for multiword sketches

2.49

2012/02/29

FIX: fix mishandling default encoding value in wmap API
support extracting terms from word sketches in API

2.48

2012/02/22

requires finlib >= 2.15

support for attribute values occurring more than 4G (2^{32) times}
support for extracting terms from word sketches

2.47.1

2012/02/18

FIX: fix encodevert segfaulting when run with -x

2.47

2012/02/08

requires finlib >= 2.14

support for lexicon size up to 4G (2^{32 bytes)}
FIX: concordance first-letter pagination in case of multibyte characters
FIX: mksubc does not fail on invalid attributes and empty subcorpora

2.46.1

2012/02/01

FIX: case-insensitive frequency distribution of utf8 corpora
FIX: do yet more tolerant Unicode conversion failure handling

2.46

2012/01/25

added indices of lexicon by sorted frequency
FIX: encodevert handles absent structure attributes properly
FIX: subcorpora contained first document range duplicated under specific circumstances

2.45.2

2011/12/08

FIX: parallelization of sketches with m4 definitions or dual gramrels
FIX: mkwmap correctly handles empty streams when joining, does not write zero counts

2.45.1

2011/10/20

FIX: do more tolerant Unicode conversion failure handling

2.45

2011/10/07

requires finlib >= 2.13

more descriptive CQL error messages
support for Unicode input/output using manatee.setEncoding()
automatic memory handling of Python objects
encodevert, genws and mkwmap logs timestamp with each message
prevent writing structures overflowing 32bit integer
32to64.py correctly handles multiple overflows and overflows between begin and end
parallel computation of word sketches

2.44.1

2011/09/17

FIX mkwmap: fixed join phase if partial join is bigger than 4GB

2.44

2011/09/13

MAXDETAIL defaults to MAXCONTEXT if not set in the configuration file

2.43

2011/09/09

MAXCONTEXT set to 100 by default

2.42.1

2011/09/07

FIX: CQL evaluation in case concatenation subquery is empty

2.42

2011/08/31

mksubc prints progress on standard output
mksubc does not fail if DOCSTRUCTURE does not exist

2.41

2011/08/05

compilecorp automatically runs mknorms to perform proper normalization per structure attribute
mknorms support corpora over 2G

2.40.2

2011/08/04

requires finlib >= 2.12.4

fix ordering of nested structures in concordance

2.40.1

2011/07/29

FIX: extending concordance KWIC fixed for |kwic|>1 or KWIC interleaved with colloc

2.40

2011/07/28

intelligent autodetection of attribute locale

2.39

2011/06/28

support for excluding KWIC from collocations
FIX: CQL evaluation: [attr=”non-existing”]? [attr=”existing”] returned empty result instead of “existing” occurrences
FIX: mksubc command failed to compute document frequencies on new subcorpus

2.38.2

2011/06/10

FIX: encodevert support for memory-only corpora over 2GB

2.38.1

2011/06/02

FIX: frequency distribution failing if case-insensitiv/retrograde

2.38

2011/05/12

CQL allows ‘’ and ‘’ for matching N-th struct
corpquery can sort results using GDEX and set default attribute
improved display of concordance reference
support for storing corpora over 2GB in memory only
FIX: UTF-8 character counting and lower-casing

2.37.1

2011/05/05

FIX: count collocations only once per context

2.37

2011/04/30

maximum nesting of structures limited to 10 by default

2.36.1

2011/04/21

FIX: fix encodevert warning on nested structures printing corpus position instead of file line

2.36

2011/04/06

added parse2wmap for creating sketches from dependency input
fixed dirty cache after rebuilding sketches
fixed multiple memory leaks in SWIG API
fixed mkvirt failing if corpus directory is missing
changed default MANATEE_REGISTRY to /corpora/registry
mksubc needs much less memory

2.35

2011/03/15

fix locating of nested structures
support attribute-based pagination of concordances
prevent colisions of wmap and manatee in SWIG api
faster docf computation implemented in c++
support for virtual corpora

2.34.1

2011/03/13

faster docf computation (ca. 20 x)
show Manatee exception messages in Python

2.34

2011/03/05

requires finlib >= 2.12

compilecorp support for creating subcorpora
encodevert automatically closes too many nested structures
mksubc computes frequency in documents into .docf files
changed format of word sketch .rev file — added support for collocations
export exceptions into SWIG API
regexp2ids takes voluntary filter pattern argument

2.33.2

2011/02/28

FIX: compilecorp computes sizes for corpora without structures
FIX: encodevert creates data dir with mode 755 instead of 751

2.33.1

2011/01/20

FIX: ngrsave: added NGRAM_SIZE and IGNORE_PUNC parameters

2.33

2011/01/11

compilecorp precomputes file with token, word, doc, paragraph and sentence counts

2.32.2

2010/11/24

FIX: encodevert looping on input containing NULL byte

2.32.1

2010/10/31

- FIX: “STRUCTLIMIT s” generates instead of deprecated

2.32

2010/10/27

requires finlib >= 2.11

New Features:

enhanced corpquery script which makes it possible to specify (via command-line options) reference attribute, context, limit for the number of results andstructures and attributes to be printed
new parse2wmap tool for generating sketches (data for wmap) from a positional attribute
ngrsave can now print document IDs of duplicate n-grams instead of n-grams and number of documents
after the compilation, compilecorp checks for temporary files that indicate an error
enhancements to the CQL:
- new “==” and “!==” operators that perform a match against fixed string (i.e. not a regular expression)
  Note that with two exceptions of “”” and “
  ” no expansions are performed on the string.
  Examples:
  “.”, “$”, “~” matches a single dot, dollar sign and tilda, respectively,
  “n” matches a backslash followed by the character n,
  “”
  ” matches a double-quotes character followed by a single backslash
- a meet/union query can occur at any position in the query and they are not introduced by the “MU” keyword, which is deprecated and raises an error
- old within syntax has been already deprecated (in favor of consistent within and now raises an error as well
- support for inequality matching using new operators: “<=”, “!<=”, “>=”, “!>=”. The comparison on a string is performed in a way that compares numeric parts numerically and alphabetical parts alphabetically. Examples:
  [word>="cake"] matches “cake” as well as “came”,
  matches e.g. 145UA01, 143UA01, 145TA00 etc.
- meet/union queries can use numeric labels and be subject to global conditions as any other query parts — e.g. (meet 1:[] 2:[]) & 1.tag = 2.tag;
- a frequency function (denoted simply as f) can be used as part of the query together with numeric labels — e.g. 1:[] & f(1.word) >= 1000;

Bugfixes:

encodevert -v works again
encodevert can again read piped input data (“| ” in VERTICAL in corpus configuration file)
CQL queries using parallel corpora notation work again
UTF-8 support in regular expressions
encodevert doesn’t crash if no attributes are given in the configuration fail nor command-line

2.31.3

2010/10/27

FIX: Computing frequency distribution of multivalue attributes
FIX: Encodevert warns if there are are opened structures at the of the compilation — this always indicates an error and in case of nested structures leads to significant performance loss.

2.31.2

2010/08/04

FIX: compilecorp fails because of genhist.py which should be genhist
FIX: strip spaces in all attribute values
FIX: make dist* targets work again

2.31.1

2010/04/26

FIX: crash when MANATEE_REGISTRY=”” or config path is a directory

2.31

2010/04/23

requires finlib >= 2.10

New Features:

support for nested structures

Bugfixes:

fixed displaying of empty collocations

2.30

2010/04/15

New Features:

“===NONE===” used as attribute default DEFAULTVALUE

Bugfixes:

fixed displaying concordance with empty nodes

2.29.1

2010/04/10

FIX: typo in CQL parser causing the build to fail with C locale

2.29

2010/04/07

New Features:

compilecorp script for complex handling of corpus and sketch compilation

Bugfixes:

unfinished corpus data reports size 0, does not crash

2.28.1

2010/03/11

FIX: encodevert limits its memory usage to available physical memory

2.28

2010/01/19

requires ANTLR3.2 or higher

New Features:

allow ${attribute} substitution in DISPLAYBEGIN/DISPLAYEND
CQL enhancements:
- support for “ within ”
- “containing” as dual option to “within”
- enable meet/union query after within/containing
- support for “within NUMBER”

Bugfixes:

fixed mkwmrank on empty wmaps

2.27

2010/01/11

New Features:

gcc 4.3 and 4.4 compatibility
ANTLR 2.7.2 compatibility
Python API scripts now part of the distribution

[…]

2.14

corpus size more than 2 billion tokens

1.99

bug fixes in query evaluation, build

1.94

first public version

Search text corpora with Sketch Engine

Sketch Engine offers a range of tools to work with text corpora in 90+ languages.

about Sketch Engine

Sketch Engine changelog – Manatee

2.152.1

2.152

2.151.5

2.151.4

2.151.3

2.151.2

2.151.1

2.151

2.150.4

2.150.3

2.150.2

2.150.1

2.150

2.149.3

2.149.2

2.149.1

2.149

2.148

2.147

2.146.6

2.146.5

2.146.4

2.146.3

2.146.2

2.146.1

2.146

2.145

2.144.1

2.144

2.143.4

2.142

2.141

2.140.2

2.140.1

2.140

2.139.3

2.138.4

2.138.3

2.137.3

2.137

2.136

2.135.5

2.135.4

2.135

2.134

2.133.4

2.133.3

2.133.3

2.133.2

2.133.1

2.133

2.132.1

2.132

2.131.3

2.131

2.130.6

2.130.5

2.130.3

2.130.2

2.130

2.129

2.128.4

2.127

2.126

2.125.2

2.124

2.122

2.121.2

2.121

2.120.1

2.120

2.119

2.118

2.117

2.116

2.115.3

2.115.2

2.115.1

2.115