The Arabic WebCorpus (arTenTen) is a text corpus created from the collected internet texts. The corpus belongs to the TenTen corpus family which is a set of the same processed web corpora with the target size 10+ billion words. Sketch Engine currently provides access to Tenten corpora in more than 30 languages.
part-of-speech (POS) tagged and lemmatized with the MADA tool
We have also created ‘word sketches’: one-page, automatic, corpus-derived summaries of a
word’s grammatical and collocational behavior. We use examples to demonstrate what the corpus can
show us regarding Arabic words and phrases and how this can support lexicography and inform