How to set up a vertical file for a Learner Corpus

The errors and corrections are marked by subsequent segments, e.g.

<err type="Typo">
cnoference      NN      cnoference-n
</err>
<corr type="Typo">
conference      NN      conference-n
</corr>

means “cnoference” corrected as “conference”. The <err> and <corr> structures are mandatory in the error corpora, as well as their proper closures </err>, </corr> (this is because of nesting).
The ‘type’ must be the same in both the error and the respective correction.
Both the error and the correction can be empty, but in this case, a special ‘===NONE===’ token must be inserted. For example:

<err type="DeletedWord">
cnoference      NN      cnoference-n
</err>[[BR]]
<corr type="DeletedWord">
===NONE===      ===NONE===      ===NONE===
</corr>

This means that the word “cnoference” was deleted by the corrector. The nesting works in a natural way. For example:

international   JJ      international-j
conference      NN      conference-n
<err type="Repetition">
<err type="Typo">
cnoference      NN      cnoference-n
</err>
<corr type="Typo">
conference      NN      conference-n
</corr>
</err>
<corr type="Repetition">
===NONE===      ===NONE===      ===NONE===
</corr>

This means that the word “cnoference” was first corrected as “conference” and then deleted because it is actually a repetition.