Arabic web corpus was created by Serge Sharoff and was tagged by AMIRA-1.2. It has 174 million tokens and is encoded in UTF.

Mona T. Diab (2007) Towards an optimal POS tag set for Modern Standard Arabic Processing Recent Advances. In Natural Language Processing (RANLP), August, Borovets, Bulgaria.