CQL for geeks | Sketch Engine

WARNING!

This CQL functionality is primarily meant for development and testing.
Use at your own risk!

! – complement operator

since manatee 2.122 An exclamation mark ( ! ) before a position (square brackets) is a logical not on a corpus range, i.e. a complement operator yielding corpus range complementary to its argument.

The following two examples will return:

the whole corpus except for nouns which will be gapped
the corpus parts which are not inside the sentence structure. Since usually all corpus text should be within sentence structures, this might be useful to identify incorrect data in the corrpus.

![tag="N.*"]
!< s/>

within! X and containing! X are complements with semantics “within the complement of X” and “containing the complement of X”

Word sketch seeks

since manatee 2.84 Knowing a particular seek offset in the word sketch data files, the related concordance can be retrieved using:

[ws(level,seek)]

The level can be 0, 1 or 2 for the level of headwords, grammatical relations or collocations, respectively. The seek depends on particular corpus compilation, hence this kind of queries is mainly suitable for technical manipulation and combination of word sketch concordances.

Word Sketches: swap & ccoll

since manatee 2.84 If word sketches are available in the corpus, the following operators can be used in CQL.

swap

Use swap to swap the KWIC with the selected collocations. The syntax is:

swap (<COLLNUM>, <ONEPOSITION>) 
[swap (1, ws ("car", "modifier", "new"))]

ccoll

Use ccoll to re-label the given collocation. The syntax is:

ccoll (<OLDCOLLNUM>, <NEWCOLLNUM>, <ONEPOSITION>)
[ccoll (1, 2, ws ("car", "modifier", "new"))]

This relabels the first collocation as second.

[ccoll (1, 2, ws ("car", "modifier", "new"))]

This relabels 1 to 3 and back, i.e. a NOOP.

[ccoll (3, 1, ccoll (1, 3, ws(2, 6543)))]

Searching for position numbers

since manatee 2.84 Use [#POSITION] to find a concrete token in the corpus.

[#100]	finds the 100th token, the concordance will consist of 1 line
[#100 \| #210]	will display the 100th and 210th token (the concordance will contain 2 lines
[#100-210]	the concordance will contain 111 lines
[!#100-210]	will display all tokens minus the 111 tokens on positions 100-210, the concordance will consist of many lines as there are tokens in the corpus minus 111
![#100-210]	is a complement of tokens on positions 100-210, the concordance will consist of 2 lines: 1st line: tokens on positions 1 – 99 2nd line: tokens on positions 211 till the end of the corpus

the n-th structure

since manatee 2.38 The following examples will refer to:

the 5th document in the corpus
each document in the corpus but not the 5th document (excludes the 5th document)
a range of documents, in this case the 5th, 6th, 7th, 8th, 9th and 10th document in the corpus

<doc #5>
<doc !#5>
<doc #5-10>

+ * with tokens

WARNING!

Only use with small corpora. The computation can be very time consuming.

These regular expression operators can be used with tokens but the computation can be extremely time consuming especially with large corpora. Not recommended.

Instead, use curly brackets { } for repetition and [ ] { } for distance between tokens.

avoid	recommended
`[tag="N."]`	`[tag="N.*"]{0,10}`
`[tag="N.*"]+`	`[tag="N.*"]{1,10}`

Limit

When [ ]+ or [ ]* is used, a limit of maximum 100 repetitions is applied and the query behaves as [ ]{1,100} and [ ]{0,100} respectively. The limit is applied irrespective of the criteria inside the square brackets.

Queries exploiting terms

since manatee 2.133 General notation: [term(headword)] where regular_expression is a pattern that is matched against terms, as they are indexed in the database. For example, [term("award title")] see this example (login required). The query will find results of the term “white house”.

In some (mostly old) corpora, individual words within the term are connected with ‘_’ and the term has a suffix ‘-x’. This means the example above would look like [term("award_title-x")].

within/containing NUMBER

since manatee 2.28 General notation: within/containing NUMBER
Both of the within/containing queries support a shortcut of within/containing NUMBER which expands to within/containing []{NUMBER}.

The following example searches for strings of 4-10 words.

[tag="J.*"]{1,2}[tag="N.*"]{1,5}[tag="V.*"][tag="R.*"]{1,2}

Adding within 5 at the end of the query finds strings having 4-5 words, see this example (login required)

[tag="J.*"]{1,2}[tag="N.*"]{1,5}[tag="V.*"][tag="R.*"]{1,2} within 5