This documentation is in the form of Python examples. If you stitch the code snippets from this page together and replace placeholders in them, it should work like a charm. However, this API is still a work in progress. Things will break without warning. You’ve been warned.


If you have your own files, you can create a new corpus using our API within just a few steps:

  1. authenticate yourself,
  2. create a new corpus for a given language,
  3. upload files and then
  4. wait for processing.

After these steps, you will be able to access your corpus with API as usual (see what you can do). Of course, the variety of available queries will depend on the language and the content (size) of the files. So let’s start. You will need a few Python modules and your API key which you can get here.

#!/usr/bin/python
import json
import requests
import time

auth = ('%username%', '%api_key%')
URL = 'https://the.sketchengine.co.uk/api'

Before creating a corpus, you need to know what language you will be using. Let’s stick with English for now.

r = requests.post(URL + '/corpora', auth=auth, data=json.dumps({
    'language_id': 'en',
    'name': 'api_test'
}))

You needed only two parameters: the language of the corpus and its name. Use ISO 639-1 language codes. The API provides also a list of all languages supported by Sketch Engine.
We recommend to use only ASCII (uppercase and lower case Latin) characters in corpus names.

All responses are in JSON, you will need corpus ID for the future calls, this way you get it:

corpus_id = r.json()['data']['id']

Now let’s upload some files. You need to provide their names, actual content and MIME type. Here’s an example.

files = {'file': ('testing.txt', open('/path/to/your/file/testing.txt', 'rb'), 'text/plain')}
r = requests.post(URL + '/corpora/' + str(corpus_id) + '/documents', auth=auth, files=files, params={'feeling': 'lucky'})

When you send files to the corpus, they are automatically processed which takes some time. You need to wait until the processing is done before starting corpus compilation. Check the compilation status of the corpus periodically:

r = requests.get(URL + '/corpora/' + str(corpus_id) + '/compilation', auth=auth)
status = r.json()['data']['status']
while status != 'READY':
    r = requests.get(URL + '/corpora/' + str(corpus_id) + '/compilation', auth=auth)
    status = r.json()['data']['status']
    time.sleep(5)

Once the files are converted and tagged, the status of the corpus will be READY. And that’s time to run the compilation so you can query the corpus later. The compilation takes also some time so you need to wait again.

r = requests.post(URL + '/corpora/' + str(corpus_id) + '/compilation', auth=auth)
status = r.json()['data']['status']
while status != 'COMPILED':
    r = requests.get(URL + '/corpora/' + str(corpus_id) + '/compilation', auth=auth)
    status = r.json()['data']['status']
    time.sleep(5)

Here you go! The status now should be COMPILED and you are free to use the corpus. Use the corpname attribute as identifier for corpus querying.

If the status is READY after running a compilation, it means that the compilation probably failed.

If you have any questions or need to report a problem, contact us at support@sketchengine.co.uk

Happy hacking!