Comparing tokens in a query

since manatee 2.32 A global condition is used to compare tokens between each other and to set addtional options to individual tokens.

Tokens must be labelled before global conditions can be set for them. A label is a digit  followed by a colon, e.g. 2:[ ]. Then a condition must be added to the very end of the query after an ampersand &.

This query will find any two tokens whose tag is the same, i.e. two nouns, two adjectives, two verbs etc.:

1:[] 2:[] & 1.tag = 2.tag

More than one position can be labelled to be used in the condition.

This query is an extension of the previous one. It finds two words with the same tag and another word with a distance of up to 10 words, whose lemma is the same as the lemma of the first one. For example: Malt whisky is made from malted

1:[] 2:[] []{3,10} 3:[] & 1.tag = 2.tag & 3.lemma=1.lemma

A more practical example might serve to identify places with potentially clumsy stylistics where word forms of the same lemma are used too close to each other, e.g. awards are awarded

Note that using containing or within may require parentheses around the first part of the query.

(1:[tag="N.*"] [ ]{0,3} 2:[tag="V.*"] within < s/>) & 1.lemma=2.lemma & 1.lc!=2.lc

Word frequency with global conditions

This query will find 2-word expressions consiting of high-frequency and low-frequency words, i.e. where the first word has a frequency higher than 10,000 and the second lower than 50:

1: [ ]  2: [ ] & f(1.lemma)>10000 & f(2.lemma)<50

Logical disjunction (OR "|")

since manatee 2.206 Global conditions can also contain logical OR "|". We strongly recommend not to use this operator due to the significant slowdown of computing, especially in large corpora.

(1:[tag="N.*"] [ ]{0,3} 2:[tag="V.*"] within < s/>) & 1.lemma=2.lemma | 1.lemma_lc=2.lemma_lc