Sketch Engine allows users to build corpora from their own documents. It is not uncommon that users have their corpus data in multiple files and want to upload all of them at the same time. Unfortunately, this is not supported. There are a few possible solutions:
- To upload multiple documents in an archive file (.zip, .tar, .tar.gz, and .tar.bz2).
- It is also possible to upload them one by one.
- Otherwise, convert all documents to plain text, concatenating them to a single file and upload only the concatenated file. If the file is too large for a HTTP upload, FTP upload can be used. Structural XML-like mark-up is supported for uploaded plain text files. This can be used for marking document boundaries and/or adding metadata about various parts of the text. Example:
<doc id="doc001" author="Lewis Carroll" title="Jabberwocky" text_type="poetry"> Twas bryllyg, and ye slythy toves Did gyre and gymble in ye wabe: All mimsy were ye borogoves; And ye mome raths outgrabe. </doc> <doc id="doc002" author="Beatles" title="Yesterday" text_type="lyrics"> Yesterday, All my troubles seemed so far away, Now it looks as though they're here to stay, Oh, I believe in yesterday. </doc>