What is a subcorpus?
Each corpus can (but does not have to) be divided into smaller parts called subcorpora. Subcorpora can be used to divide the corpus by the type (fiction, newspaper), media (spoken, written) or time (e.g. by years) or by any other criteria. Subcorpora can be overlapping, the same segment can appear in several subcorpora it belongs to.
Concordance searches and word lists can make use of subcorpora by searching only one part of the corpus or by providing statistics of the same phenomenon in different subcorpora, e.g. in written vs. spoken language or in fiction vs. newspaper.
How to create a subcorpus?
A corpus can be divided into subcorpora using a configuration file or can be divided into subcorpora later. This page explains the latter. Such subcorpora are only available to the user who created them. Expert users can set up subcorpora shared with all users.
You will be able to make use of your subcorpora in Concordance searches and word lists.
OPTION 1 – subcorpus from text types
This procedure will create a subcorpus from text types. This option can only be used if the corpus was annotated for text types.