Open Cambridge Learner Corpus (Uncoded)

The Open Cambridge Learner Corpus (Uncoded) is an English text corpus compiled collaboratively by Cambridge University Press and Cambridge English Language Assessment. The word uncoded refers to the fact that error tagging is not included.

The Open CLC corpus is a balanced subset of the Cambridge Learner Corpus, which reflects the genre of exam writing by learners of English. The corpus contains 2.9 million words of over 10,000 student responses taken from the Cambridge English Language Assessment suite of exams – FCE, CAE and CPE – and includes data from a range of L1s. The responses are students from more than 60 countries speaking 7 different first languages.

Corpus text types store detailed information about examined student. These enable to search through a specific part of the corpus, e.g.  the first language of students, their nationality or age. It means that the corpus can be used to find out how a specific group of students express and create answers at different levels in English exams.

Please note that views expressed in the Open CLC are the views of individual exam candidates and do not represent the views of Cambridge University Press or Cambridge English Language Assessment.

Part-of-speech tagset

The corpus was tagged and lemmatized using the TreeTagger tool. The POS tagset summary is available here.

Availability

The corpus is accessible to all users with a subscription plan and site licence members (not to trial users).

You may publish the results of research that uses the OpenCLC. In any such publication, you may reproduce excerpts of the text from the OpenCLC only as permitted under UK copyright law and “fair dealing”. You must clearly identify any such excerpt as originating in the OpenCLC, as owned by Cambridge University Press and Cambridge English Language Assessment. If users use this corpus, please cite the reference below.

Bibliographic references

OpenCLC (v1). 2017. Distributed by Lexical Computing Limited on behalf of Cambridge University Press and Cambridge English Language Assessment.

Search the Open Cambridge Learner Corpus

Sketch Engine offers a range of tools to work with this English Corpus.

or

Other text corpora in Sketch Engine

Sketch Engine provides access to 350+ language corpora.

Use Sketch Engine in minutes

Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.