Arabic corpus of the Quran

The Quran annotated corpus is an Arabic text corpus built up from the Quran, the central religious text of Islam. This corpus version was prepared by Zainab Alqassem (Alqassem 2013). The data was taken from the Quranic Arabic Corpus (Dukes 2009) and the QurAna anaphoric coreference database (Sharaf and Atwell 2012). Corpus texts were lemmatized and POS tagged

Part-of-speech tagset

The morphological annotation in the corpus uses POS tagset specially created for the Quranic Arabic Corpus.

Tools to work with the Quran annotated corpus

A complete set of Sketch Engine tools is available to work with this Arabic corpus to generate:

  • word sketch – Arabic collocations categorized by grammatical relations
  • thesaurus – synonyms and similar words for every word
  • word lists – lists of Arabic nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context


version 1 (7th May 2013)

  • initial version


The Quran annotated corpus

Zainab, A. (2013). Unifying Quranic analyses into a single database. BSc Final Year Project Dissertation, School of Computing, University of Leeds.

The Quranic Arabic Corpus

Dukes, K. (2009). The Quranic Arabic Corpus. Leaman, Oliver.

QurAna: Corpus of the Quran annotated with Pronominal Anaphora

Sharaf, A. B. M., & Atwell, E. (2012). QurAna: Corpus of the Quran annotated with Pronominal Anaphora. In LREC (pp. 130-137).

Search the Quran annotated corpus

Sketch Engine offers a range of tools to work with this Quranic Arabic Corpus.


Other text corpora in Sketch Engine

Sketch Engine offers 350+ language corpora.

Use Sketch Engine in minutes

Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.