English Corpus for SkELL is a text database used in English SkELL interface available at http://skell.sketchengine.co.uk/run.cgi/skell. The corpus does not contain whole documents but only sentences sorted according to their text quality. The score of text quality was computed by the GDEX system and has assigned to each sentence.
The corpus is made up from Wikipedia articles, selected parts of English Web 2013 corpus and Timestamped web corpus and English websites gained by the WebBootCat tool. These sources provide a good example of how English is used in everyday, standard, formal and professional context over 1 billion words in more than 57 million sentences.
no. of documents
no. of words
English Web 2013
Timestamped web corpus
British National Corpus
The corpus is accessible to all users with a subscription plan and site licence members (not to trial users).
first published version
minor changes to GDEX formula
Removed first several sentences with wrong encoding
removed all Project Gutenberg books because of very old language