This corpus was created by Kirsten Ackermann and David Tugwell, using data available on the web and components from existing corpora, in a collaboration between Pearson Language Tests (part of Pearson PLC) and Lexical Computing Ltd. to create a large, balanced corpus of the language that non-native speakers of English need in order to thrive at English-language universities around the globe.

First version: 2009. The corpus was launched at IATEFL 2009. A full report is in preparation. A programme of additions and improvements over a number of years is anticipated.

It was part-of-speech tagged and lemmatized using English TreeTagger. Word sketches were prepared by David Tugwell, as used for other corpora in Sketch Engine.

Access policy

To obtain authorisation from Pearson to access the corpus:

  1. please contact Veronica Benigno Provide a brief description of your research and state your academic affiliation.
  2. Then get in touch with Sketch Engine at who will update your account permissions accordingly.