Creating text annotation

For most languages, Sketch Engine provides tagging and lemmatizing tools which automatically marks each token in your corpora with information on part of speech and its lemma (a basic form of the token).

If your language cannot be annotated at the level of tokens by Sketch Engine, you can annotate it manually by adding metadata (called also token attribute) to tokens. You can also rewrite/fix token annotation automatically created by Sketch Engine.

Procedure in a nutshell

(If your corpus is in Sketch Engine, first download it in vertical format.)

  • Open the corpus in a plain text editor or annotation software, e.g. FoLiA
  • Add metadata to tokens and save the file.
  • Upload it to Sketch Engine where metadata will be processed into token attributes automatically.

Example annotation – phrase level

Useful for attributes common to the whole phrase:

<named_entity type="geography" subtype="bridge">
Golden
gate
bridge
</named_entity>

Example annotation – token level

Used to represent attributes of separate words (e.g. token ID, word, lowercase lemma, part of speech, dependency ID):

0 Golden golden adjective 1
1 gate gate noun 2
2 bridge bridge noun -

Both examples combined

<named_entity type="geography" subtype="bridge">
0 Golden adjective 1
1 gate noun 2
2 bridge noun -
</named_entity>