Guangwai-Lancaster Chinese Learner Corpus

A brief description

Guangwai-Lancaster Chinese Learner Corpus (CLC), a 1.2-million-word corpus of learner Mandarin Chinese, which is a result of the collaboration between Guangdong University of Foreign Studies and Lancaster University, represents a new addition to corpora of L2 Chinese.  The corpus has both a spoken (621,900 tokens, 48%) and a written (672,328 tokens, 52%) part and covers a variety of task types and topics. It is fully error tagged. It can be used to explore various theoretical and practical issues pertaining to the acquisition of Chinese as a foreign language.

Development team

Prof. Hai Xu (Guandong University of Foreign Studies)

Dr. Richard Xiao (Lancaster University)

Dr. Vaclav Brezina (Lancaster University)


Prof. Hai Xu:

Dr. Vaclav Brezina:

The funding for the corpus was obtained by Dr. Richard Xiao to whom the corpus is also dedicated.


The building of the corpus was supported by the British Academy IPM Scheme, Grant No. PM120462.