The corpora available here have been collected using the WebBootCat with DANTE seeds.
For more details about the approach see, Avinesh PVS, Diana McCarthy, Dominic Glennon & Jan Pomikálek, Domain Specific Corpora from the Web
The data was prepared for the Sketch Engine using a lemmatiser, part-of-speech tagged using TreeTaggerwith UTF-8 English parameter file trained on Tagset and English Sketch Grammar v.2.5 (Treetagger tagset).
List of Domains and sizes
|Domain||Size in words|