The corpus consists of transcripts of informal conversation-like interviews with 1 or 2 speakers and a fieldworker, and some self-recordings. The transcripts are from two ESRC-funded projects: Linguistic Innovators, and Multicultural London English.

The data was prepared for the Sketch Engine using a lemmatiser, part-of-speech tagged using TreeTagger withUTF-8 English parameter file trained on Tagset and English Sketch Grammar v.2.5 (Treetagger tagset).

For more details about the speakers and the research projects from which these transcripts derive, see the bibliography (below).

List of Sub-corpora:

  • Innovators Transcripts
  • Multicultural London English


Cheshire, J., Kerswill, P., Fox, S. and Torgersen, E. (2011). Contact, the feature pool and the speech community: The emergence of Multicultural London English. In Journal of Sociolinguistics 15, pp. 151–196.