• downloaded with wget: getting Gutenberg
  • cleaned with justext (slightly changed algorithm)
  • title and author sometimes retrievable from HTML META tags, but not used in SkELL Corpus
  • tagged with TreeTagger