The corpus is prepared by Steven Bird. The process is described in the bibliography (below).

All material is taken from here. It was part-of-speech tagged and lemmatised using TreeTagger, a leading part-of-speech tagger which has been trained for a number of languages.

Grammatical relation definitions as prepared by David Tugwell for other English corpora were used.

Word sketches are of the first version.


Bird, Steven et al. The ACL Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational Linguistics. 2008.