Sketch Engine has a dedicated interface for working with learner corpora. The interface allows the user to search by the error itself, by the type of error, by the error correction or by a combination of any of the aforementioned criteria.

In addition, any metadata included in the corpus can be used in the search and analysed to get information about how learner mistakes are distributed across age groups, proficiency levels, mother tongue, types of test tasks etc.

A correctly constructed learner corpus can provide answers to global questions such as:

  • what is the most frequent type of error
  • which age group makes most mistakes

as well as very specific questions:

  • are mistakes related to verb tenses more frequent at B2 or C1 level?

Search options available in the learner corpus search interfaceLearner corpus search interface example

Text types are generated from the metadata included in the corpus.

Setting up a learner corpus for Sketch Engine

It is highly recommended that the corpus be uploaded as a vertical file according to the specifications on this page.

Please contact us if the data are in a different format. Our team will inspect your data and will advise or assist in converting and/or uploading your data.

Setting up a vertical file for a learner corpus

The errors and corrections are marked by subsequent segments, e.g.

 <err type="Typo">
cnoference      NN      cnoference-n
</err>
<corr type="Typo">
conference      NN      conference-n
</corr> 

means “cnoference” corrected as “conference”. The following structures are mandatory in the error corpora, as well as their proper closures , (this is because of nesting):

 <err> 

and

 <corr> 

The ‘type’ must be the same in both the error and the respective correction.
Both the error and the correction can be empty, but in this case, a special ‘===NONE===’ token must be inserted. For example:

 <err type="DeletedWord">
cnoference      NN      cnoference-n
</err>[[BR]]
<corr type="DeletedWord">
===NONE===      ===NONE===      ===NONE===
</corr> 

This means that the word “cnoference” was deleted by the corrector. The nesting works in a natural way. For example:

 international   JJ      international-j
conference      NN      conference-n
<err type="Repetition">
<err type="Typo">
cnoference      NN      cnoference-n
</err>
<corr type="Typo">
conference      NN      conference-n
</corr>
</err>
<corr type="Repetition">
===NONE===      ===NONE===      ===NONE===
</corr> 

This means that the word “cnoference” was first corrected as “conference” and then deleted because it is actually a repetition.

Setting a visual style for error and correction structures

When you search in a learner’s corpus, the content of errors is usually rendered in red colour and the corrections in green colour. You may change it by using defining DISPLAYCLASS in the configuration file for the two appropriate structure definitions. E.g.

STRUCTURE err {
    DISPLAYCLASS "errclass"
}
STRUCTURE corr {
    DISPLAYCLASS "corrclass"
}

Then in CSS file (view.css) you may define the class and add styles you want:

.errclass {
    background-color: red;
    color: white;
    font-weight: bold;
}
.corrclass {
    background-color: green;
    color: white;
    font-weight: bold;
}