Open Cambridge Learner Corpus (Uncoded)
The Open Cambridge Learner Corpus (OpenCLC) has been compiled collaboratively by Cambridge University Press and Cambridge English Language Assessment.
The OpenCLC is a balanced subset of the Cambridge Learner Corpus, which reflects the genre of exam writing by learners of English. The corpus contains 2,700,000 words of over 10,000 student responses taken from the Cambridge English Language Assessment suite of exams – FCE, CAE and CPE – and includes data from a range of L1s. The responses are students from more than 60 countries speaking 7 different first languages.
Corpus text types store detailed information about examined student. These enable to search through a specific part of the corpus, e.g. the first language of students, their nationality or age. It means that the corpus can be used to find out how a specific group of students express and create answers at different levels in English exams.
Please note that views expressed in the OpenCLC are the views of individual exam candidates and do not represent the views of Cambridge University Press or Cambridge English Language Assessment.
The corpus was tagged and lemmatized using TreeTagger with
tokenization and other processing carried out by Sketch Engine English processing pipeline v. 2. See the tagset summary.