Using JSON

JSON (JavaScript Object Notation, http://www.json.org/) is a lightweight data-interchange format. It is easy for humans to read and write as well as for machines to parse and generate. The Sketch Engine offers a possibility of using the JSON format as the input and/or output format.

JSON input

Input in the JSON format can be passed to the Sketch Engine by the universal json attribute. All attribute names and values (including numbers and comma-delimited lists) should be encoded as JSON strings (note that quotation mark characters from the CQL queries must be escaped). Lists of attributes (e.g. by the q attribute in the view method) should be encoded as JSON arrays. Example of a complete query using JSON:

https://beta.sketchengine.co.uk/bonito/run.cgi/view?json={"corpname":"preloaded/bnc", "q":["q[lemma=\"test\"]", "r250"]}

JSON output

In this section, we describe the output of the system in case the format attribute is set to json. The resulting JSON object has quite intuitive structure, so we will describe it here rather briefly. We also do not describe the output completely since there are some data that are used only internally and their description might be confusing (for this reason, there are some fields in the examples that are not described in the output structure and might change in time). In the following, the output of all methods listed before is described. Note also that all structure names (JSON objects, arrays) begin with a capital letter, while atom names (strings, numbers) always are lowercase.

wordlist

Structure of the ‘word list’ query result:

  • Items – list of items in the word list. One item contains:
    • str – string expression of the item (e.g. word)
    • freq – frequency of the item

Structure of the ‘keywords’ query result:

  • Keywords – list of selected keyword items. One item contains:
    • arf – the ARF value
    • cfreq – frequency in the reference (sub)corpus
    • score – item score
    • sfreq – frequency in the selected (sub)corpus
    • str – string expression of the item (e.g. word)

Example (query and result):

https://beta.sketchengine.co.uk/bonito/run.cgi/wordlist?corpname=preloaded/bnc;wlattr=word;wlpat=test.*;wlsort=f;wlmaxitems=2;format=json

{
   "Items": [
      {
         "freq": 11040,
         "str": "test"
      },
      {
         "freq": 4472,
         "str": "tests"
      }
   ]
}

Example (query and result) – keywords:

https://beta.sketchengine.co.uk/bonito/run.cgi/wordlist?corpname=preloaded/bnc;wlattr=word;keywords=1;usesubcorp=wri-to-be-spoken;wlsort=f;wlmaxitems=2;ref_corpname=preloaded/bnc;format=json

{
   "Keywords": [
      {
         "arf": 5.9,
         "cfreq": 402,
         "score": 679.1,
         "sfreq": 402,
         "str": "Video-Tape"
      },
      {
         "arf": 47.2,
         "cfreq": 3765,
         "score": 679.1,
         "sfreq": 3765,
         "str": "Video-Taped"
      }
   ]
}

wsketch

Structure:

  • Gramrels – list of grammatical relations including all relevant collocates. Contains:
    • count – overall frequency of the gramrel
    • name – name of the gramrel
    • score – overall score of the gramrel
    • seek – pointer to the concordance (can be used in a w-type query in the view method)
    • Words – list of collocates in the gramrel. Each collocate contains:
      • count – frequency of the collocate in gramrel
      • score – collocate score
      • seek – collocate pointer to the concordance (can be used in a w-type query in the view method)
      • word – string expression of the collocateIf ‘clustered collocations’ are demanded, each collocate can contain information about the collocate cluster:
      • totalcount – overall frequency of the cluster (0 if the cluster is empty)
      • totalseek – cluster pointer to the concordance (can be used in a w-type query in the view method, but must be preceded by comma (‘,’)) (” if the cluster is empty)
      • Clust – list of words in the cluster, each word has attributes count, score, seek, word as described above. If the cluster is empty, this attribute is not included

Example (query and result):

https://beta.sketchengine.co.uk/bonito/run.cgi/wsketch?corpname=preloaded/bnc;lemma=test;lpos=-n;format=json

{
   "Gramrels": [
      {
         "Words": [
            {
               "Clust": [
                  {
                     "count": 32,
                     "id": 848,
                     "score": 12.63,
                     "seek": 4816731,
                     "word": "run"
                  },

                  ...

               ],
               "count": 294,
               "id": 1029,
               "score": 43.96,
               "seek": 4816743,
               "totalcount": 384,
               "totalseek": "4816743,4816731,4816760,4816700,4816806,4816675",
               "word": "pass"
            },

            ...

         ],
         "count": 3406,
         "name": "object_of",
         "score": 2.1,
         "seek": 79181
      },

    ...

thes

Structure:

  • Words – list of similar words. Each word contains:
    • score – word score
    • word – string expression of the wordIf ‘clustered items’ are demanded, each word can contain information about the word cluster:
    • Clust – list of words in the cluster, each word has attributes score, word as described above. If the cluster is empty, this attribute is not included
  • freq – frequency of the selected lemma in corpus

Example (query and result):

https://beta.sketchengine.co.uk/bonito/run.cgi/thes?corpname=preloaded/bnc;lemma=test;lpos=-n;maxthesitems=6;clusteritems=1;format=json

{
   "Words": [
      {
         "Clust": [
            {
               "id": 4226,
               "score": 0.223,
               "word": "examination"
            }
         ],
         "id": 941,
         "score": 0.243,
         "totalcount": 0,
         "totalseek": "",
         "word": "assessment"
      },

      ...

   ],
   "commonurl": "corpname=preloaded\/bnc;lemma=test;lpos=-n",
   "freq": 15789,
   "lemma": "test",
   "lpos": "-n"
}

wsdiff

This method does not currently support JSON output.

view

Structure:

  • Lines – list of concordance lines. Each line contains:
    • Kwic – list of KWIC segments (segment stands for one or more tokens). Each segment contains:
      • class – class name of the segment (e.g. ‘attr’ = attribute, ‘coll’ = collocation etc.)
      • str – string expression of the segment (attributes are preceded by ‘\/’ for correct display on the HTML page)
    • Left – list of left context segments (same structure as Kwic)
    • Right – list of right context segments (same structure as Kwic)
    • ref – line reference (‘reference’ field content)
    • toknum – token number (of the first token in KWIC)
  • concsize – number of lines in concordance (or number of hits)
  • numofpages – number of pages in concordance

Example (query and result):

https://beta.sketchengine.co.uk/bonito/run.cgi/view?corpname=preloaded/bnc;q=q[lemma="drug"][lemma="test"];pagesize=2;ctxattrs=word,tag;format=json

{
   "Lines": [
      {
         "Align": [],
         "Kwic": [
            {
               "class": "col0 coll",
               "str": " drug test"
            }
         ],
         "Left": [
            {
               "class": "attr",
               "str": "\/VM0"
            },
            {
               "class": "",
               "str": " be"
            },

            ...

         ],
         "Right": [
            {
               "class": "",
               "str": " at"
            },

            ...

         ],
         "hitlen": ";hitlen=2",
         "leftspace": "",
         "linegroup": "_",
         "ref": "A0M",
         "toknum": 654026
      },

      ...

   ],
   "concsize": 70,
   "fromp": 1,
   "lastlink": "fromp=35",
   "nextlink": "fromp=2",
   "numofpages": 35
}

freqs

Structure:

  • Blocks – list of frequency blocks (tables). Each table contains:
    • Head – list of the table headings. Each heading contains:
      • n – string representation of the heading (name of the column)
      • s – ID of the column, can be used as a value of the freq_sort attribute
    • Items – list of lines in the table. Each line contains:
      • Word – list of items in the left part of the table (i.e. all columns except ‘Freq’ and “Rel[%]” column). Each item contains:
        • n – string representation of the item
      • freq – frequency (content of the “Freq” column)
      • rel – content of the “Rel[%]” column. If the column is not present, this attribute is not included

Example (query and result):

https://beta.sketchengine.co.uk/bonito/run.cgi/freqs?q=q[lemma="test"];corpname=preloaded/bnc;fcrit=word/+0+lemma/+0+tag/+0;flimit=3000;ml=1;format=json

{
   "Blocks": [
      {
         "Head": [
            {
               "n": "word",
               "s": 0
            },
            {
               "n": "lemma",
               "s": 1
            },
            {
               "n": "tag",
               "s": 2
            },
            {
               "n": "Freq",
               "s": "freq"
            }
         ],
         "Items": [
            {
               "Word": [
                  {
                     "n": "test"
                  },
                  {
                     "n": "test"
                  },
                  {
                     "n": "NN1"
                  }
               ],
               "fbar": 301,
               "freq": 8609,
               "norel": 1
            },

            ...

collx

Structure:

  • Head – list of table headings. Each heading contains:
    • n – name of the column. Can be empty.
    • s – column ID. Can be used as a value of the csortfn attribute. If n is empty, this is not included
  • Items – list of table lines. Each line contains:
    • Stats – list of the statistics in the line (in the same order as in the heading). Each statistic contains:
      • n – value itself (content of the column)
    • freq – collocation frequency
    • str – string expression of the collocate

Example (query and result):

https://beta.sketchengine.co.uk/bonito/run.cgi/collx?q=q[lemma="test"];corpname=preloaded/bnc;csortfn=m

{
   "Head": [
      {
         "n": ""
      },
      {
         "n": "Freq",
         "s": "f"
      },
      {
         "n": "T-score",
         "s": "t"
      },
      {
         "n": "MI",
         "s": "m"
      }
   ],
   "Items": [
      {
         "Stats": [
            {
               "s": "2.828"
            },
            {
               "s": "12.938"
            }
         ],
         "freq": 8,
         "nfilter": "q=n-5 5 1 [word=\"Belvin\"]",
         "pfilter": "q=p-5 5 1 [word=\"Belvin\"]",
         "str": "Belvin"
      },

      ...

save* methods

These methods return the same output as their mother methods (see above) and are deprecated to be used for JSON output.

subcorp

Structure:

  • Subcorplist – available subcorpora list. Each subcorpus contains:
    • n – name of the subcorpus

Fields available only if new subcorpus is created:

  • corpsize – size of the mother corpus (number of tokens)
  • subcsize – size of the created subcorpus (number of tokens)

Example (query and result):

https://beta.sketchengine.co.uk/bonito/run.cgi/subcorp?corpname=preloaded/bnc;format=json

{
   "SubcorpList": [
      {
         "n": "book"
      },
      {
         "n": "wri-to-be-spoken"
      }
   ]
}