The Digital Corpus of the European Parliament (DCEP) is a collection of documents published on the European Parliament’s official website. This parallel corpus contains texts in 23 languages.
For more information set the following websites:
File alignment statistics for all pairs
Sentence alignment statistics for all pairs
References – Relevant publications
For a more detailed description of DCEP and when making reference to DCEP in scientific publications, please refer to:
- Hajlaoui Najeh, Kolovratnik David, Vaeyrynen Jaakko, Steinberger Ralf, and Varga Dániel (2014). DCEP-Digital Corpus of the European Parliament. In Proc. LREC 2014 (Language Resources and Evaluation Conference). Reykjavik, Iceland. Mai 26-31, 2014. pp 3164-3171 (URL: http://www.lrec-conf.org/proceedings/lrec2014/pdf/943_Paper.pdf).
To compare DCEP with the other linguistic resources distributed by EU institutions, see:
- Steinberger Ralf, Mohamed Ebrahim, Alexandros Poulis, Manuel Carrasco-Benitez, Patrick Schlüter, Marek Przybyszewski & Signe Gilbro (2014). An overview of the European Union’s highly multilingual parallel corpora. In Language Resources and Evaluation Journal (LRE). DOI: 10.1007/s10579-014-9277-0.
To see how DCEP was added to Sketch Engine, see EUR-Lex.