6) Annotating your corpus

It is possible to add your own annotations to the texts in your corpora. You can use an ordinary text editor or a third party annotation software (e.g. Brat).

The procedure

  1. Click ‘Download corpus’ in the left side menu on the corpus building page and check Format ‘vertical’ to save the corpus in a one token per line format.
  2. Open the saved file in the annotation editor, make annotations, save.
  3. Add the annotated file to the corpus, remove the old file from the corpus.

Please note the Sketch Engine accepts annotations in two forms:

  • XML structures with attributes (useful for annotating phrases),
  • positional attributes (useful for annotating words).

It might be required to transform the annotation tool output to a format supported by Sketch Engine to successfully compile the corpus.

Example annotation – phrase level

Useful for attributes common to the whole phrase:

<named_entity type="geography" subtype="bridge">

Example annotation – token level

Used to represent attributes of separate words (e.g. token ID, word, lowercase lemma, part of speech, dependency ID):

0 Golden golden adjective 1
1 gate gate noun 2
2 bridge bridge noun -

Both examples combined

<named_entity type="geography" subtype="bridge">
0 Golden adjective 1
1 gate noun 2
2 bridge noun -