since manatee 2.96


Searching for similar words with CQL

Use the tilde ~ to generate a thesaurus for the word and include the top N thesaurus items into the query. For example, to find the verb chop followed by vegetables, use this (or replace carrot with any other vegetable):

[lemma="chop"] []{0,3} ~"carrot-n"

The query will first generate a thesaurus for the word carrot based on the reference corpus and then will search for the combination of chop and the top N items from the thesaurus for carrot in the selected corpus. Use the thesaurus to preview the words that will be included.

Note: Some corpora require the thesaurus word as lempos, others as lemma or word. Try all of them if one does not work.

When no number is specified, the top N items will be determined automatically based on the frequency of the word in the corpus (10-base logarithm of the frequency of “word” in the corpus, i.e. frequency of 100 – 2 synonyms will be used,  1,000 – 3 synonyms etc.

To set the number of thesaurus items manually, use:

~15"carrot-n"
[lemma="chop"] []{0,3} ~15"carrot-n"
Note

The reason why the thesaurus is generated from a reference corpus and not the selected corpus is that a very large corpus is needed for good quality thesaurus.