Arabic web corpus was created by Serge Sharoff and was tagged by AMIRA-1.2. It has 174 million tokens and is encoded in UTF.

Tagset info

See used tagset summary.


Mona T. Diab (2007) Towards an optimal POS tag set for Modern Standard Arabic Processing Recent Advances. In Natural Language Processing (RANLP), August, Borovets, Bulgaria.