This is a facility available from the left submenu in the word list feature and related to the use of word sketch highlights in Sketch Engine.
FindX allows you to produce a ranked list of words according to a specified definition of the behaviour you wish to examine. The definition is used to calculate statistics and rank words according to the corpus that you have selected. You can either user the definitions available at existing lists or upload an input file (see format and examples below). There are 3 scenarios for the definitions :
- i) a specified CQL query: (Q1) In this scenario, the frequency of the pattern specified by the CQL, with the word substituted at %s in Q1, is divided by the frequency of the word.
freq(Q1[word]) / freq(word)
- ii) a comparison of two such CQL queries: (Q1 and Q2) In this scenario, the frequency of the Q1 query (with the word instantiated at %s) is divided by the sum of that same frequency and the frequency of Q2 (with the word instantiated at %s).
freq(Q1[word]) / (freq(Q1[word]) + freq(Q2[word]))
- iii) a word sketch definition: (WS) Here the frequency of the word in the word sketch grammatical relation is divided by the frequency of the word in the entire corpus.
freq(WS[word]) / freq(word)
Additionally, a regular expression (RE) can be specified for removing some words from consideration. Only the words matching the RE are considered. This is mainly for efficiency reasons
FindX (WS highlights) definition file format
=highlight_id HR human readable name Q1 query_1 Q2 query_2 # optional RE regular_expression # optional
=highlight_id HR human readable name WS wsdef_relation_name RE regular_expression # optional
# All strings in the definition files starting with # are comments and are ignored to the end of the line.
Examples are attached. Note that you may need to alter the minimum ratio and minimum frequency to see any results.
Adam Kilgarriff and Pavel Rychlý (2008). Finding the words which are most X. In Proceedings of the 13th EURALEX International Congress. Spain, July 2008, pp. 433–436