Search the ukWaC British English corpus
The corpus was prepared by Adriano Ferraresi. The whole process is described in the paper Introducing and evaluating ukWaC, a very large web-derived corpus of English at LREC 2008.
All material is taken from the .uk domain, therefore it is fair to argue that it is a corpus of mainly British English although other variants are likely to be included as long as they were found on a .uk domain.
It was part-of-speech tagged and lemmatised using TreeTagger, a leading part-of-speech tagger which has been trained for a number of languages. It uses Penn Treebank Tagset.
Grammatical relation definitions, as prepared by David Tugwell for other English corpora, were used.
Sketch Engine also has a version of ukWaC tagged with SuperSenseTagger (sst-light) described in Ciaramita and Altun (2006).