Compiling corpus

You need to prepare a vertical and registry file before compiling…

Discrepancies between API and interface results

When you query a corpus in the web interface you may notice that…

Common corpus structures

It is generally practical to divide a corpus into smaller parts…

Scripts for adding header fields

Adding attributes is based on mapping existing structure attributes…

Variation in hit counts

It often seems like you have got a different hit count for the…

uaTenTen corpus

Ukrainian TenTen corpus was crawled by SpiderLing in 2014.…

trTenTen corpus

Turkish TenTen corpus. Crawled by SpiderLing in December 2011…

svTenTen corpus

Swedish TenTen web corpus. The corpus is cleaned by jusText,…

skTenTen corpus

Slovak TenTen corpus. The corpus has been tagged by the ​Ľ.…