The corpora available here have been collected using the WebBootCat with DANTE seeds.

For more details about the approach see, Avinesh PVS, Diana McCarthy, Dominic Glennon & Jan Pomikálek,​ Domain Specific Corpora from the Web

The data was prepared for the Sketch Engine using a lemmatiser, part-of-speech tagged using TreeTaggerwith UTF-8 English parameter file trained on Tagset and English Sketch Grammar v.2.5 (Treetagger tagset).

List of Domains and sizes

Domain Size in words
Commerce 17.0 M
Cook 28.1 M
Employment 13.4 M
Finance 32.1 M
Food 22.1 M
IT 30.2 M
Law 34.0 M
Medical 35.3 M