You can use these if you have access to a local installation and can run commands at the Linux prompt

General utilities follow below:

corpinfo

Prints basic information of a given corpus.

Usage: corpinfo [OPTIONS] CORPNAME

-d dump whole configuration

-p print corpus directory path

-s print corpus size

-w print corpus wordform counts

-g OPT print configuration value of option OPT

corpquery

Prints concordance of a given query

Usage: corpquery CORPUSNAME QUERY [ OPTIONS ]

Options:

-r ATTR reference attribute

(default: None)

-c LEFT,RIGHT | BOTH left and right or both context length

(default: 15)

-h LIMIT maximum number of results

(default: -1)

-a ATTR1,ATTR2,… comma separated list of attributes to be shown

default: word,lemma,tag)

-s STR1,STR2… comma separated list of structures to be shown

(use struct.attr or struct.* to show structure attributes; default: s,p,doc)

-g GDEX_CONF use GDEX with a given GDEX_CONF configuration file

(default: None; use – for default configuration) use -h to set the result size (default: 100)

-m GDEX_MODULE_DIR GDEX module path (directory with gdex.py or gdex_old.py)

lsclex

Lists lexicon of given corpus attribute

usage: lsclex [-snf] CORPUS ATTR

-s str2id — strings from stdin translate to IDs

-n id2str — IDs from stdin translate to strings

-f print frequences

lsslex

Lists number of tokens for all structure attribute values

usage: lsslex CORPNAME STRUCTNAME STRUCTATTR

example: lsslex bnc bncdoc alltyp

freqs

Prints frequencies of words in a given context of a given query

usage: freqs CORPUSNAME ‘QUERY’ ‘CONTEXT’ LIMIT

default CONTEXT is ‘word -1’ default LIMIT is 1

examples: freqs susanne ‘[lemma=”house”]’ ‘word -1’

freqs susanne ‘[lemma=”run”]’ ‘word/i 0 tag 0 lemma 1’ 2

freqs susanne ‘[lemma=”test”] []? [tag=”NN.*”]’ ‘word/i -1>0’ 0

corpcheck

Checks the validity of various corpus attributes and the correctness of compiled corpus data. Any issues found with the corpus are presented in a clear, human-readable format in standard error output.

Usage: corpcheck CORPNAME