Parallel corpus from tabular data
(basic user)

The simplest way to create a parallel corpus is to upload data in a tabular format such as a spreadsheet (Excel), TMX, XML, XLIFF or other similar formats.

Spreadsheet format requirements

Spreadsheets must contain a header with language names in the first row and then aligned segments (e.g. sentences) side by side in each row, one column per language.

Follow these steps

  • log in to Sketch Engine
  • click Upload TMX or XLS
    other supported formats: XLIFF, XML, TSV, TAB, xlsx
    (if xlsx does not upload correctly, try opening the file in Excel and save as Excel 97-2003 Workbook)
  • type the corpus name and choose the file
  • on the following screen, check the languages were identified correctly
  • click create

Data in each language will be processed into a separate monolingual corpus aligned with the data in the other language(s) included in the source file.

Searching

To search the corpus as a parallel corpus, first select the corpus in the language that should appear on the left and then, when setting the search criteria, select the other language(s). Multiple languages can be selected to display a multilingual concordance.

Other options

In addition to tabular data, a parallel corpus can also be created from other data sources and 1:1 or m:n mapping can be used.