Authentication

Authentication is an optional feature that can be omitted in local installations. In such a case, simple requests can be issued to Sketch Engine. However, servers usually require some sort of user authentication. Local installations typically use Basic http authentication, while our servers authenticate users via Corpus Architect.

No authentication

Minimalistic non-authenticated API request on local computer:

Example in Java:

import java.net.*;
import java.io.*;

public class GetURL {
    public static void main(String[] args) throws Exception {
         // url with the query
        String url_string = "http://localhost/run.cgi/wordlist?corpname=bnc;wlattr=word;wlminfreq=5;wlmaxitems=100;wlpat=test.*;format=json";

        // connecting the SketchEngine Server
        URL url = new URL(url_string);
        InputStream stream = url.openStream();
        InputStreamReader isr = new InputStreamReader(stream);
        BufferedReader reader = new BufferedReader(isr);
        
        try {
            Thread.sleep(10000);
        } catch (InterruptedException ex) {
            Thread.currentThread().interrupt();
        }

        // data receiving
        System.out.println(reader.readLine()); // json data are on the first line
    }
}

Example in python:

import urllib2
import time

url = "http://localhost/run.cgi/wordlist?corpname=bnc;wlattr=word;wlminfreq=5;wlmaxitems=100;wlpat=test.*;format=json"
request = urllib2.Request(url)

# data receiving
file = urllib2.urlopen(request)
data = file.read()
file.close()
time.sleep(10)

print data

Basic http authentication

A common variant on local installations but not compatible with the beta.sketchengine.co.uk server.

Example in Java: (download)

import java.net.Authenticator;
import java.net.PasswordAuthentication;

...
        final String usr = "<username>";
        final String passwd = "<password>";
        
            // authentication issues
        Authenticator auth = new Authenticator() {
            protected PasswordAuthentication  getPasswordAuthentication () {
                return new PasswordAuthentication(usr, passwd.toCharArray());
            }
        };
        Authenticator.setDefault(auth);
...

Example in Python: (download)

import urllib, urllib2, base64
...
usr = '<username>'
passwd = '<password>'
...
request = urllib2.Request(url)

# authentication
base64string = base64.encodestring('%s:%s' % (usr, passwd))[:-1]
request.add_header("Authorization", "Basic %s" % base64string)
...

Example in R: (download)

library(RCurl)
# build a URL
result <- getURL("URL", userpwd="USERNAME:PASSWORD", httpauth = 1L)
Sys.sleep(10)
# do something with the result

Corpus Architect authentication

Authentication method used on our servers (http://the.sketchengine.co.uk and  http://beta.sketchengine.co.uk).

Example in Java (download): ; required non-standard libraries can be downloaded from example page.

import org.apache.commons.httpclient.*;
import org.apache.commons.httpclient.cookie.CookiePolicy;
import org.apache.commons.httpclient.methods.*;
import org.apache.commons.httpclient.protocol.*;
import org.apache.commons.httpclient.contrib.ssl.*;
...
class example3_ca {

    static final String root_url = "beta.sketchengine.co.uk";
    static final String ske_username = "<username>";
    static final String ske_password = "<password>";
   
    public static void main(String[] args) {
        
        String corp = "bnc";
        String method = "view";
        String base_url = "/bonito/run.cgi/";

        ...

        // make HTTPS connection
        HttpClient client = new HttpClient();
        try {
          Protocol.registerProtocol("https", new Protocol("https", (ProtocolSocketFactory)new EasySSLProtocolSocketFactory(), 443));
          //client.getHostConfiguration().setHost(root_url, 80, "http");
          client.getHostConfiguration().setHost(root_url, 443, "https");
          client.getParams().setCookiePolicy(CookiePolicy.BROWSER_COMPATIBILITY);
        } catch (java.security.GeneralSecurityException e){
          e.printStackTrace();
        } catch (IOException e){
          e.printStackTrace();
        }
        client.getParams().setCookiePolicy(CookiePolicy.BROWSER_COMPATIBILITY);
       
        // retrieve session id
        GetMethod authget = new GetMethod("/login/");
        try {
            int code=client.executeMethod(authget);
        } catch (IOException ex) {
            System.err.println("Error: couldn't retrieve session ID from Sketch Engine server.");
            System.exit(1);
        }
        authget.releaseConnection();
       
        // login   
        PostMethod authpost = new PostMethod("/login/");
        NameValuePair submit   = new NameValuePair("submit", "ok");
        NameValuePair username = new NameValuePair("username", ske_username);
        NameValuePair password = new NameValuePair("password", ske_password);
        authpost.setRequestBody(new NameValuePair[] {submit, username, password});
           try {
             int code=client.executeMethod(authpost);
        } catch (IOException ex) {
            System.err.println("Error: couldn't login to Sketch Engine server.");
            System.exit(2);
        }
        authpost.releaseConnection();

        try {
            Thread.sleep(10000);
        } catch (InterruptedException ex) {
            Thread.currentThread().interrupt();
        }

        // retrieve data
...

Example in Python: (download)

import urllib, urllib2, cookielib

import time

username = '<username>'
password = '<password>'
corp     = 'bnc'
root_url = 'https://beta.sketchengine.co.uk'

# authentication
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({ 'username' : username,
                                'password' : password,
                                'submit' : 'ok',
                              })
data = opener.open('%s/login/' % root_url)
data = opener.open('%s/login/' % root_url, login_data)
time.sleep(10)

file = opener.open(url)
data = file.read()
file.close()

Example in R:

library(RCurl)
library(rjson)

loginurl = "https://beta.sketchengine.co.uk/login/"
dataurl = "https://beta.sketchengine.co.uk/bonito/run.cgi/view?q=alc,[lemma=\"book\"];corpname=preloaded/bnc2;format=json"

# authentication parameters
pars=list(
    username="USERNAME",
    password="PASSWORD"
)

# setup curl
agent="Mozilla/5.0"
curl = getCurlHandle()
curlSetOpt(cookiejar="cookies.txt", useragent=agent, followlocation=TRUE, curl=curl)

# authenticate with login form
postForm(loginurl, .params = pars, curl=curl)

# access the requested URL
html=getURL(dataurl, curl=curl)
Sys.sleep(10)

# parse JSON result
document <- fromJSON(html, method='C')

# work with the object
show(document)

# clean up
rm(curl)
gc()

Example in Bash

# authenticate with cookies
wget --save-cookies ca_cookies.txt \
     --post-data 'username=USERNAME&password=PASSWORD' \
     https://the.sketchengine.co.uk/login/ \
     -O /dev/null

# call the URL
wget --load-cookies ca_cookies.txt \
     "https://the.sketchengine.co.uk/bonito/run.cgi/wordlist?corpname=preloaded%2Fbnc2&wlattr=word&wlpat=%5Epro%2E%2A&format=json" \
     -O result.json
sleep 10

# work with the result
cat result.json