The corpus collection of 40-languages
The OPUS parallel corpus is a set of text corpora which have aligned sentences so sentences correspond the same sentences in other languages. OPUS project collects 40 languages. On account of this, user can check translation sentence pairs for many languages.
The parallel corpora available here have been collected, prepared and aligned by Joerg Tiedermann in the OPUS project (see http://opus.lingfil.uu.se/). We are most grateful to him for his great work and co-operation. The data was prepared for the Sketch Engine using a range of lemmatisers, part-of-speech taggers and Sketch Grammars.
Unlike the first version, the alignment is now m:n, which allows for just one corpus per language.
OPUS an open source parallel corpus allows to search bilingual and multilingual data in many languages, find concordances, collocations, word list and more.
The OPUS project in Sketch Engine contains 40 languages: Afrikaans, Albanian, Arabic, Bosnian, Bulgarian, Chinese Simplified, Chinese Traditional, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Italian, Japanese, Korean, Latvian, Lithuanian, Macedonian, Norwegian, Persian, Polish, Portuguese, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Turkish, Ukrainian.