CQL – basics

Useful tip

A CQL query with many attributes of the same type can be shortened.

[lemma="refill"][lemma="the"][lemma="teapot"]

Select the default attribute: lemma from the dropdown and use:

"refill" "the" "teapot"

Simlarly this CQL:

[word="a"] [tag="J.*"] [tag="N.*"]

can be shortened with default attribute set to tag like this:

[word="a"] "J.*" "N.*"

Positions in quotes will be treatead as if they had the default attribute. Positions with an explicit attribute will keep that attribute.

CQL basics

To use CQL, go to the corpus search and select the CQL option. CQL will not work anywhere else in the interface. Expert users will use CQL for the writing of Word Sketch grammars and term grammars.

Syntax

With CQL, complex criteria can be set to find one or many tokens. Criteria for each token must be between a pair of square brackets [ ]. The format is:

[attribute="value"]

To find the lemma teapot, use

[lemma="teapot"]

Each token must be inside its own pair of square brackets. To search for phrase refill the teapot, use

[lemma="refill"][lemma="the"][lemma="teapot"]

Spaces

Spaces have no function in CQL. Feel free to use spaces to make the code more readable. This code is equivalent to the previous one.

[ lemma = "refill" ]  [ lemma = "the" ]  [ lemma=  "teapot"  ]
Careful in values!

There should not be any spaces inside quotes. This finds nothing because a lemma cannot start with spaces.

[lemma="  the"]
More examples
taskCQL coderesult
find examples of “went”[word="went"]concordance of the word went
find examples of all forms of go[lemma="go"]concordance of go, goes, going, gone, went
find exaples of all words tagged with the tag NP[tag="NP"]concordance of various words tagged as NP

Starting with, ending with or containing

Regular expressions can be used with values in CQL, i.e. inside the inverted commas.

taskCQL code
words starting with confus-[lemma="confus.*"]
words ending with -ious[lemma=".*ious"]
three-letter words starting b- and ending -g[lemma="b.g"]

A complete set of Regular expressions is supported and complex criteria can be used.

Distance between tokens, repetition

Square brackets [ ] stand for ‘any token’. Curly brackets { } are used for repetition of the preceding token.

taskCQL coderesult
find examples of ‘refill’ and ‘kettle’ with one word in between[lemma="refill"] [ ] [lemma="kettle"]refill the kettle
refills a kettle
refilled his kettle
refill our kettles
examples of ‘have’ and ‘opinion’ with 2 to 4 words in between[lemma="have"] [ ]{2,4}[lemma="opinion"]has his own opinion
have an interesting opinion
have a very interesting opinion
had some interesting opinions
find examples of ‘drink’ and ‘water’ with exactly two adjectives between them[lemma="drink"] [tag="J.*"]{2}[lemma="water"]drink enough pure water
drink warm lemon water
drink fresh coconut water
drink enough plain water

? optional token

A token can be made optional by placing a questiona mark ? after the square bracket.

taskCQL code
find examples of ‘drive my car’ or ‘drive my own car’[lemma="drive"] [lc="my"] [lc="own"]? [lemma="car"]
alternative solution without using ?[lemma="drive"][lc="my"][lc="own"]{0,1} [lemma="car"]
(zero or 1 repetition of ‘own’)

Equal and not equal, bigger and smaller

These comparison operators are supported:

equal
less than or euqal to, more than or equal to
=
<= >=
not equal
not less than or equal to, not more than or equal to
!=
!<= !>=
equal
not equal *)
==
!==

<=  >=  !<=  !>=

since manatee 2.32 the aphabetical parts of the value are compared lexicographically (‘in the dictionary order’) and numerical parts numerically. This is useful with structure attributes, where >="AB2010CD" will include values such as "BB0000CD", "AB2011CD" or "AB2010CE".

==  !==

since manatee 2.32 unlike = and !=, these operators treat values as simple text, not as a regular expression.

CQLmatching result
[word="."]all one-letter words
(the full stop is treated as regular expression)
[word=="."]all full stops
(the full stop is treated as full stop, not as a regular expression)

Escaping the regular expression operators is the same as using == and !==. These two CQL queries will produce the same result:
[word="\."]
[word=="."]

Note that even in case of ==, !==, two characters need to be escaped: the quotes (“) and the backslash (\).

AND OR NOT inside values

One token can have more conditions. They must all appear inside the same pair of square brackets and Boolean operators must be used to between them.

& (ampersand) = AND
| (pipe) = OR
! (exclamation mark) = NOT

the CQL codes are valid with or without spaces which were inserted for clarity
CQLresult
[ lemma="test" & tag="N.*" ]
find all forms of the word ‘test’ which is a noun
[word="test" & tag!="V.*"]
[word="test" & !tag="V.*"]
finds word ‘test’ which is NOT a verb
both CQL codes are equivalent
[ word="round" & ( tag="N.*" | tag="V.*" ) ]finds the word round tagged as a noun or verb

OR with tokens

These two operators can be used outside tokens, i.e. outside the square brackets:

| (pipe) = OR – means one token or another token, i.e. the token to the right or to the left of the pipe
This use of the pipe (|) should only be limited to cases when there is no other solution because it makes the search time consuming. In most cases, it can be replaced by a pipe used inside the token, which is faster. See examples below

the CQL codes are valid with or without spaces which were inserted for clarity
used outside tokensequivalent inside tokens
(recommended)
comments
[lemma="dog"] | [lemma="wolf"][lemma="dog|wolf"]a pipe inside the token will find the result faster
[tag="N.*"] | [lemma="the"][tag="N.*" | lemma="the"]the example may not seem logical but such searches might be needed to discover incorrectly tagged items or other problems in the corpus
([lemma="big"][lemma="dog"]) | [lemma="wolf"]no equivalentwhen searching for multi-word expressions with the OR operator, the only way is to put them into brackets and use a pipe between them