Arabic web corpus was created by Serge Sharoff and was tagged by AMIRA-1.2. It has 174 million tokens and is encoded in UTF.

Tagset info

see below


Bibliography

Mona T. Diab (2007) Towards an optimal POS tag set for Modern Standard Arabic Processing Recent Advances. In Natural Language Processing (RANLP), August, Borovets, Bulgaria.