The corpus was prepared by Adriano Ferraresi. The process is described in Ferraresi et al (LREC 2008) .

All material is taken from the .uk domain. It was part-of-speech tagged and lemmatised using TreeTagger, a leading part-of-speech tagger which has been trained for a number of languages. It uses Penn Treebank Tagset.

Grammatical relation definitions, as prepared by David Tugwell for other English corpora, were used.

There is alco version of UKWaC tagged with SuperSenseTagger (sst-light) described in Ciaramita and Altun (2006).