ALC: Arabic Learner corpus

The Arabic Learner corpus (ALC) is a language corpus made up of texts written and spoken texts that belong to learners of Arabic in Saudi Arabia. All texts were gained in the years 2012–2013 and include 282732 words of 942 students from 67 nationalities.

See more on the project site: http://www.arabiclearnercorpus.com/about-the-corpus-en

Part-of-speech tagset

Texts were POS tagged using the Stanford parser with the following POS tagset description.

Tools to work with the Arabic ALC corpus

A complete set of Sketch Engine tools is available to work with this Arabic Learner Corpus to generate:

  • word sketch – Arabic collocations categorized by grammatical relations
  • thesaurus – synonyms and similar words for every word
  • word lists – lists of Arabic nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context

Bibliography

TenTen corpora

Alrabiah, M., Al-Salman, A., & Atwell, E. S. (2013). The design and construction of the 50 million words KSUCCA. In Proceedings of WACL’2 Second Workshop on Arabic Corpus Linguistics (pp. 5-8). The University of Leeds.

Search the corpus of classical Arabic

Sketch Engine offers a range of tools to work with the KSUCCA corpus.

or

Other text corpora in Sketch Engine

Sketch Engine offers 350+ language corpora.

Use Sketch Engine in minutes

Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.