Manage a corpus

When you have created a corpus, use the tools available from the left hand side panel. To manage your corpus, go to the corpus selection screen by clicking Home and select the corpus by clicking on its name.

You will see this screen:

Manage corpus

(1) options at the top

The line at the top gives you 4 options:

  • expand your corpus by uploading files
  • expand your corpus by downloading texts from the internet
  • compile the corpus
  • search the corpus

Expanding your corpus

You can more texts to your corpus using any of the available methods. You can combine the methods, i.e. part of the corpus can be from uploaded files, part form the internet and part from the translation memory.

Add new file
will let you upload more files

Add data from web (WebBootCaT)
lets you use WebBootCat to find and download more relevant texts from the internet, more information»

Compile corpus

see no. 3 below

Search corpus

This is the equivalent of clicking Search in the left menu. Gives you access to a standard search to create a concordance.

(2) Show corpus files

Will take you back to the screen shown in the screenshot which shows the files in the corpus. Each line is one occassion of adding texts to corpus. For example, each uploading (even if multiple files) is one line, each use of WebBootCaT is one line.

(3) Compile corpus

A corpus needs to be compiled (=processed) each time new texts are added or when the user wants t use a new sketch grammar.

The settings give the user the option to to select the xml tags that should be used as structures in your corpus. You also need to specify the structure used for references which will be used to enclose the data from each file that you uploaded. This must be different to any of the other structure names that you have already used in your file. By default this is doc.

  • You also have the check box option to use the program “onion” which will automatically remove duplicate content from your corpus. If you opt to use onion then you can specify which structure the program will consider when removing duplicates (for example, at the document, paragraph or sentence level).

(4) Configure corpus

(5) Set sketch grammar

You can select the sketch grammar from a list of preloaded grammars or write your own sketch grammar (see Writing a sketch grammar).

(6) Set subcorpora

Set subcorpus definitions

You can define subcorpora of your corpora (see an example of Subcorpus definition file).

(7) Download corpus

Download the corpus as text or in vertical format. Vertical format is useful if you want to retain any of the structures for uploading back into Sketch Engine.

User corpora

Corpora created by users can be fully accessed and downloaded by the user who created the corpus or with whom they share the corpus.

Preloaded corpora

Preloaded corpora can be searched but cannot be downloaded.

(8) Share corpus

Access privileges

Specify access for users or groups of users or everyone in your organisation. You can define groups of users using the User groups function in the left hand side menu above the Corpus and Admin options. Access can be granted at the following levels:

  • read only (they can view but not change),
  • upload files (they can view and add new data) or
  • full (they will have full access and can change the configuration or recompile the corpus as well as add data to it)

(9) View logs

View the results of compiling the corpus or WebBootCat.