since manatee 2.96

Searching for similar words with CQL

Use the tilde ~ to generate a thesaurus for the word and include the top N items into the query. For example, to find the verb chop followed by vegetables, use this:

 [lemma="chop"] []{0,3} ~"carrot-n" 

The query will generate a thesaurus for the word carrot and will search for the combination of chop and the top N items from the thesaurus for carrot. You can use the thesaurus to preview the words that will be included.

Note: Some corpora require the word to be inputted as lempos, others as lemma. Use the other if the first one does not work.

When no number is specified, the top N items will be determined automatically based on the frequency of the word in the corpus (10-base logarithm of the frequency of “word” in the corpus, i.e. frequency of 100 – 2 synonyms will be used,  1,000 – 3 synonyms etc.

To set the number of thesaurus items manually, use:

[lemma="chop"] []{0,3} ~15"carrot-n" 

Using the thesaurus with small corpora and low frequency words will not generate good quality synonyms. For a high-quality thesaurus, a large corpus and a word of a decent frequency is needed.