This translation memory consists of 24 collections of texts in Bulgarian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Serbo-Croatian, Slovak, Slovenian, Spanish and Swedish language.
The aligned texts come from a large translation memory DGT published by The European Comission.
The individual corpora have been processed by the latest processing tools available in Sketch Engine.
More details / Reference publication
For a more detailed description of the DGT-TM, including more statistics on the resource, see the following publication. When making reference to DGT-TM in scientific publications, please refer to:
- Steinberger Ralf, Andreas Eisele, Szymon Klocek, Spyridon Pilos & Patrick Schlüter (2012). DGT-TM: A freely Available Translation Memory in 22 Languages. In Proceedings of the 8th international conference on Language Resources and Evaluation (LREC’2012), Istanbul, 21–27 May 2012.
For a contrastive overview of DGT-TM and the other multilingual text resources offered for download on this site, you can read the following journal article:
- Steinberger Ralf, Mohamed Ebrahim, Alexandros Poulis, Manuel Carrasco-Benitez, Patrick Schlüter, Marek Przybyszewski & Signe Gilbro (2014). An overview of the European Union’s highly multilingual parallel corpora. In Language Resources and Evaluation Journal (LRE). December 2014, Volume 48, Issue 4, pp. 679–707. DOI: 10.1007/s10579-014-9277-0.
DGT-TM has been registered with the International Standard Natural Language Resource number (ISLRN) 710-653-952-884-4.