KSUCCA: King Saud University Corpus of Classical Arabic

The King Saud University Corpus of Classical Arabic (KSUCCA) is a language corpus made up of Classical
Arabic texts dating between the 7th and early 11th century. The corpus consists of 46 million words and was created as the part of Ph.D. work of Maha Alrabiah, find out more here. The corpus contains texts from a wide range of genres, such as Religion, Linguistics, Literature, Science, Sociology, and Biography; including division into subgenres.

Part-of-speech tagset

Texts were lemmatised and POS tagged by Yonatan Belinkov using the MADA tools from the University of Columbia. See the POS tagset description.

Tools to work with the Arabic KSUCCA corpus

A complete set of Sketch Engine tools is available to work with this corpus of Classical Arabic to generate:

  • word sketch – Arabic collocations categorized by grammatical relations
  • thesaurus – synonyms and similar words for every word
  • word lists – lists of Arabic nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context

Bibliography

TenTen corpora

Alrabiah, M., Al-Salman, A., & Atwell, E. S. (2013). The design and construction of the 50 million words KSUCCA. In Proceedings of WACL’2 Second Workshop on Arabic Corpus Linguistics (pp. 5-8). The University of Leeds.

Search the corpus of classical Arabic

Sketch Engine offers a range of tools to work with the KSUCCA corpus.

or

Other text corpora in Sketch Engine

Sketch Engine offers 350+ language corpora.

Use Sketch Engine in minutes

Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.