Parallel corpus from tabular data
The simplest way to create a parallel corpus is to upload data in a tabular format such as a spreadsheet (Excel), TMX, XML, XLIFF or other similar formats.
Spreadsheet format requirements
Spreadsheets must contain a header with language names in the first row and then aligned segments (e.g. sentences) side by side in each row, one column per language.
Follow these steps
- log in to Sketch Engine
- click Upload TMX or XLS
other supported formats: XLIFF, XML, TSV, TAB, xlsx
(if xlsx does not upload correctly, try opening the file in Excel and save as Excel 97-2003 Workbook)
- type the corpus name and choose the file
- on the following screen, check the languages were identified correctly
- click create
Data in each language will be processed into a separate monolingual corpus aligned with the data in the other language(s) included in the source file.