number of aligned structures

The table shows the number of aligned structures (paragraphs in the case of EUR-Lex) for each pair of languages. In millions, click to enlarge.

The number of words is typically hundreds of millions of words for the largest languages .


Learn to work with multilingual and parallel corpora in Sketch Engine. Refer to the user guide.

A general purpose multilingual corpus available in Sketch Engine

THe EUR-Lex Corpus is a multilingual corpus in all the official languages of the European Union. The corpus has been built from HTML files available in EUR-Lex database. Thanks to the coverage of a vast area of subjects, the corpus is an excellent general purpose resource for anyone looking for translation examples in many languages.

A substantial part of the documents is translated into all official languages of the European Union (currently 24). Languages which joined the EU later are represented by smaller corpora proportional to the length of their membership.


Technically speaking, the documents are segmented and aligned on paragraph level. This means that the user can search for a matching paragraph containing the translation. The paragraphs are, however,  fine-grained and usually correspond with sentences which means that the user is able to search for matching sentences or very short paragraphs.

Sketch Engine offers also the smaller corpus of judgments of the European Parliament, see more.

How to get the data

Academic and research institutions

The EUR-Lex corpus is released under CC-BY-NC-SA licence. Because of the file size, please email us at first and we will set up a temporary download link for you. For the original documents, see the official EUR-Lex website.

For commercial use

Please contact us for a quote.

How to cite

Please, consider mentioning Lexical Computing Ltd in Acknowledgements and referring to the original paper (below) if you use EUR-Lex corpus.

Vít Baisa, Jan Michelfeit, Marek Medveď, Miloš Jakubíček: European Union Language Resources in Sketch Engine. In The Proceedings of tenth International Conference on Language Resources and Evaluation (LREC’16). European Language Resources Association (ELRA). Portorož, Slovenia. 2016.

Important copyright notice

© European Union, 1998-2016

Except where otherwise stated, reuse of the EUR-Lex data for commercial or non-commercial purposes is authorised provided the source is acknowledged (see above). The reuse policy of the European Commission is implemented by the Commission Decision of 12 December 2011. Some documents, like the International Accounting Standards, may be subject to special conditions of use, which are mentioned in the respective Official Journal. For all other copyright issues regarding EUR-Lex, please contact