This screen shows detailed statistics and an overview of tags and structures used in the corpus

Accessing the screen

The corpus details and statistics screen can be accessed in two ways. After logging in (or clicking Home),  select a corpus:

1 click the Corpus info link in the left menu


2 click the corpus name next to the search box in the heading of the corpus search screen

Descriptions of info tabs:

  • Counts – statistical information about the corpus (number of words, tokens, sentences etc.)
  • General info – information about language, encoding, last modification of the corpus and links to an overview of used tags and the info page in documentation
  • Lexicon sizes – number of various words, tags, lemmas, etc.; ambtag means the ambivalent part of speech tag which contains all tags before disambiguation (e.g. “building” can be singular common noun or -ing form of the verb)
  • Tags legend – an overview of basic tags without detailed differentiation, click the link in the header for the full tagset descrition
  • Lempos suffixes – a list of letters representing part of speeches and creating lempos by joining them to lemma with dash (e.g. “red-j” for adjective “red”)
  • Structures and attributes – a list of structures and their attributes (more detailed distribution) representing documents, sentences and other positioning tokens in the corpus. It can be searched with CQL. (e.g.  searches in all attributes rend of the structure poem)
  • Grammar relations – names and counts of relations used in word sketches

Corpus statistics and details