The Europarl parallel corpus

The Europarl corpus is a parallel corpus created from the European Parliament Proceedings in the official languages of the EU.

The corpus was prepared by Philipp Koehn. The process is described in Europarl: A Parallel Corpus for Statistical Machine Translation, Philipp Koehn, (MT Summit 2005).

All material is taken from http://www.statmt.org/europarl/.

Changelog

(spring 2015)

  • Tagged by TreeTagger.

v 7.0 (May 2012)

  • A further expanded and improved version of the corpus was released on 15th May 2012.

v 5.0 (May 2010)

  • A corpus further expanded and improved version of the earlier version was released on 20th January 2010.

Reference

Philipp Koehn. Europarl: A Parallel Corpus for Statistical Machine Translation, MT Summit 2005.

Search the Europarl corpus

Sketch Engine offers a range of tools to work with the Europarl corpus.

or