The Preposition Corpus (TPP) was developed specifically for a detailed analysis of how prepositions are used in English. Each preposition in the corpus was annotated with additional information about the sense and context. For example, the user can compare the use of preposition in followed by a place and followed by time or only find examples of in followed by a word describing a group of objects or people, e.g. in the main gallery.

The Preposition Corpus comprises 3 corpora annotated according to the Pattern Dictionary of English Prepositions (PDEP) which describes the behaviour of prepositions.

Using the corpus

A full set of Sketch Engine features is available to search the corpus.

Word Sketch

The easiest way to observe the behaviour of a preposition in English is to generate Word Sketch for the preposition. The corpus was compiled with a specific word sketch grammar to exploit the unique annotation.

Word Sketch for preposition in


CQL search

All standard searches can be used with this corpus. Complex searches are also available using CQL.

Displaying attributes

Use the view options in the left menu to select which Attributes or References will be displayed, see the screenshot below.

See the attached document to get a full understanding of attributes and values used.

Annotation in detail

Each sentence contains only one annotated preposition, i.e. each sentence serves as an example of the use of one preposition. Other prepositions in the sentence are not annotated. Individual sentences with the sense of the preposition (linked to the PDEP sense description), the PDEP class and the PDEP subclass, and supersense tags for preposition complements and governors.

governorprepositionpreposition complement
countriesinEurope

Structures and attributes in the corpus

Apart from the standard attributes as a POS tag or sentence tag, there are a few special attributes and structures related to prepositions.

  • doc.preposition (document preposition) –  structure enabling to search documents with specific prepositions, e.g.

 <doc preposition="within"> 
  • s.sense_label (sense label) –  label restricting search to those sentences with a specific sense tag (note that if this restriction is used, it should be with a specified preposition, since numbering is generally the same for each preposition; also, all sense labels include parentheses and these need to be preceded by backslashes), e.g.
 [lemma_lc="aboard"] within <s sense_label="1\(1\)" ></s> 
  • s.sense_desc (sense description) – containing a link to the PDEP pattern for the given sense
  • s.class (sentence class) – identify uses of a word or phrase misidentified as a preposition according to sentences are divided into 14 classes (classes: pv for phrasal verbs and x for infelicitous sentences, pulled out of random sample of the British National Corpus)
  • s.subc (sentence subclass) – subclass of the previous s.class
  • compl.sst and gov.sst (supersense tags) – can be used to examine the occurrence of the various WordNet lexicographer file names within the corpora (values of the supersenses are generally examined after a search for complements or governors).

Corpus processing

Sentences were parsed using the Tratz parser (A Fast, Accurate, Non-Projective, Semantically-Enriched Parser, 2011) with output in the CoNLL-X format. This format consists of a line for each token in a sentence and a blank line is used to separate the sentences. Words in the corpus are lemmatized including the assigned lempos attribute. As POS tagset was used the Penn Treebank tags with one special tag for proper adjectives:

POS tag Example
JJPROP United, North, Royal

References

Ken Litkowski. 2013. The Preposition Project Corpora. Technical Report 13-01. Damascus, MD: CL Research.

Ken Litkowski. 2014. Pattern Dictionary of English Prepositions. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, Maryland, ACL, 1274-83.