CQL – within & containing

Useful tip

Using containing with longer structures (such as sentences or paragraphs) may produce a concordance of enormous width. Switching view options from KWIC view to sentence view will make the screen layout more convenient.

Searching something inside something else or something containing something

The containing and within CQL operators are used to restrict the search only to a certain structure, i.e. to search inside:

corpus structure, e.g. sentence, paragraph, document etc.
grammatical or lexical structure, e.g. noun phrase, the sooner…the more…,

Both search for token(s) inside a structure, the difference lies in what the result is, i.e. what is highlighted in the concordance as KWIC.

within
The result is the word or phrase. The word or phrase to the left of ‘within’ will be highlighted in the concordance.

containing
The result is the whole structure. The whole sentence, paragraph, document etc. will be highlighted. *)

*) if the structure is very long, only 100 initial tokens will be displayed

within

with corpus structures

CQL searches do not take corpus structures (sentences, paragraphs, documents etc.) into account automatically. Therefore searching for dog and runs with a distance of up to 4 words from each other:

"dog" []{0,5} "runs" (with default atribute 'word' or 'lc' selected)

might find examples where dog appears towards the end of one sentence and runs appears towards the beginning of the following sentence. This is often unwanted.

Use within to ensure the search result appears within the structure you want, in this case, within the same sentence, paragraph, document or any other structure found in the corpus.

"dog" []{0,5} "runs" within < s/> (with default atribute 'word' or 'lc' selected)

with lexical and grammatical structures

since manatee 2.28The structure after within can be another CQL code defining a grammatical or lexical structure.

This query searches for nouns that appear between two verbs to be, the verbs are at a distance of max. 5 tokens from each other.

[tag="N.*"] within [tag="VB.*"] []{0,5} [tag="VB.*"]
"N.*" within "VB.*" []{0,5} "VB.*" (equivalent short version with default attribute 'tag' selected)

nesting

To prevent the above example from finding instances of the structure crossing sentence boundaries, another within can be added. Multiple within operators can be nested if necessary.

"N.*" within ("VB.*" []{0,5} "VB.*" within < s/>)     (with default attribute 'tag' selected)

in parallel corpora

Use within to search a parallel corpus using this syntax:

  within :

For example, with the English Europarl open, you can use this CQL to query the German Europarl. It will find all segments where the English corpus contains car and the aligned German segment contains Auto. In most cases, these will be segments where car was translated as Auto.

[word="car"] within europarl5_de: [word="Auto"]

containing

Use containing to find the whole structure and to display the whole structure as the result (KWIC).

with corpus structures

This will find all paragraphs containing an acronym, i.e. a word consisting of 3 or more upper-case characters:

< p/> containing [word="[A-Z]{3,}"]

with lexical and grammatical structures

The structure before containing before can be another CQL code defining a grammatical or lexical structure.

This query searches for noun phrases (sequences of up to 5 adjectives followed by a noun) which contain the adjective international.

[tag="J.*"]{1,5} [tag="N.*"] containing [word="international"]

NOT within, NOT containing

since manatee 2.111
Use ! for negation to mean:

!within = not within
!containing = not containig

This CQL will find all nouns which appear outside the < nphr> structure, i.e. outside noun phrases. The CQL will only work in corpora where the < nphr > structure exists.

[tag="N.*"]   !within   < nphr/>
"N.*"   !within   < nphr/>        the same as above, default attribute must be set to 'tag'

This CQL will find all sentences which do not contain any word starting with a capital letter.

< s/>   !containing   [word="[A-Z][A-Za-z]*"]

meet & union

CQL menu

Video lesson

Useful tip

Searching something inside something else or something containing something

within

with corpus structures

with lexical and grammatical structures

nesting

in parallel corpora

containing

with corpus structures

with lexical and grammatical structures

NOT within, NOT containing

for learners of languages

A Course in Lexicography and Lexical Computing

term extraction

learn sketch engine