DGT Translation Memory parallel corpus

DGT-Translation Memory is a database of aligned sentences from the European Union’s legislative documents (Acquis Communautaire) in 24 EU languages. Sketch Engine offers this database as parallel corpora which can be searched. Detailed information and how to cite the corpora can be found in the bibliography.

The DGT-Translation Memory consists of 24 European languages:

Bulgarian	German	Polish
Czech	Greek	Portuguese
Danish	Hungarina	Romanian
Dutch	Irish	Croatian
English	Italian	Slovak
Estonian	Latvian	Slovenian
Finnish	Lithuanian	Spanish
French	Maltese	Swedish

The aligned texts come from a large translation memory DGT published by The European Commission.

The individual corpora have been processed by the latest processing tools available in Sketch Engine.

Tools to work with the DGT Translation Memory parallel corpus

A complete set of Sketch Engine tools is available to work with this set of parallel corpora to generate:

word sketch – collocations categorized by grammatical relations
thesaurus – synonyms and similar words for every word
keywords – terminology extraction of one-word and multi-word units
word lists – lists of nouns, verbs, adjectives etc. organized by frequency
n-grams – frequency list of multi-word units
concordance – examples in context
text type analysis – statistics of metadata in the corpus

Bibliographic references

For a more detailed description of the DGT-TM, including more statistics on the resource, see the following publication. When making reference to DGT-TM in scientific publications, please refer to:

Steinberger, R., Eisele, A., Klocek, S., Pilos, S., & Schlüter, P. (2013). DGT-TM: A freely available translation memory in 22 languages. arXiv preprint arXiv:1309.5226.

For a contrastive overview of DGT-TM and the other multilingual text resources offered for download on this site, you can read the following journal article:

Steinberger, R., Ebrahim, M., Poulis, A., Carrasco-Benitez, M., Schlüter, P., Przybyszewski, M., & Gilbro, S. (2014). An overview of the European Union’s highly multilingual parallel corpora. Language resources and evaluation, 48(4), 679-707.

Search the DGT Translation Memory

Sketch Engine offers a range of tools to work with the DGT Translation Memory parallel corpus.

Tip

Learn to work with multilingual and parallel corpora in Sketch Engine. Refer to the user guide.

More parallel corpora

EUR-Lex 2/2016 parallel corpora – texts from the EUR-Lex database containing public EU documents

Eur-Lex judgments 12/2016 parallel corpora – judgments of the Court of Justice of the European Union

Europarl spoken parallel corpora – transcriptions of the European Parliament Proceedings

Open Parallel Corpus (OPUS) – translated texts from various sources, e.g. medical documents, subtitles, technical documentation, etc.

OpenSubtitles 2018 parallel corpora – movie subtitles from the OpenSubtitles database

United Nations Parallel Corpus (UNPC) – official records and other parliamentary documents of the United Nation

corpora in Sketch Engine

about Sketch Engine

Use Sketch Engine in minutes

Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.

Quick Start Guide