Adding structures, structural attributes and values

Adding structures, structural attributes and values makes it possible to annotate (add metadata) to a corpus. Document, paragraph and sentence structures are normally added automatically when building a corpus in Sketch Engine but other structures must be added manually.

If you are new to this, you might like to read this blog post first.

Procedure in a nutshell

(If your corpus is in Sketch Engine, first download it.)

  • Open the corpus in a plain text editor or annotation software, e.g. Brat.
  • Add structures, attributes and values.
  • Upload it to Sketch Engine where attributes and values will be processed into text types automatically.

Annotation terminology, format and allowed characters

An example of a sentence annotation:

<s direct_speech="yes" type="question">"Have you had time to think it over?"</s>

s is structure name
Structure names must be enclosed in angle brackets <> and can only use letters a-z, A-Z, numbers 0-9 and underscore (_).

direct_speech and type are attribute names
Attribute names can only use can only use letters a-z, A-Z, numbers 0-9 and underscore (_). Attributes can be multivalue or even hierarchical.

yes and question are values
Values must be enclosed in plaintext double quotes, rounded typographic quotes are not allowed. Values can contain any characters including accented characters. If a double quote is part of a value, it must be escaped with a backslash \"

Examples

Annotation with structures but without attributes and values.

<file>
     <p>
          <s>My Bonnie lies over the ocean</s>
          <s>My Bonnie lies over the sea</s>
     </p>
     <p>
          <s>My Bonnie lies over the ocean</s>
          <s>Oh, bring back my Bonnie to me</s>
     </p>
</file>

The indentation can be used for the user’s convenience. White space between structures is ignored. The same data in one line will still be processed correctly:

<file><p><s>My Bonnie lies over the ocean</s><s>My Bonnie lies over the sea</s></p><p><s>My Bonnie lies over the ocean</s><s>Oh, bring back my Bonnie to me</s></p></file>

An example of a corpus consisting of 2 files, with structures and structure attributes.

<file pub="1970" lang="en">
     <p style = "informal">
          <s><pers gender="female">Rebecca</pers> has worked with a full range of clients including <brand sect="automotive">BMW</brand> and <brand sect="air">Airbus</brand>.</s>
          <s> some text </s>
     </p>
     <p style = "formal">
          <s>some text </s>
          <s>some text </s>
     </p>
</file>
<file pub="1977">
     <p style = "informal">
          <s>some text </s>
          <s> some text </s>
     </p>
     <p style = "informal">
          <s>some text </s>
          <s>some text </s>
     </p>
</file>