Brexit corpus: database of articles on Brexit
The Brexit corpus is a language corpus made up of English web articles, blogs, comments, and tweets relating to Brexit, a referendum on an exit by the United Kingdom from the European Union. The corpus is comprised of news (the Guardian, the BBC, the Daily Mail, the Telegraph, etc.), various blogs, comments, as well as forum or Twitter posts.
The Brexit corpus contains rich metadata about particular articles, such as topic, author or original web domain. Moreover, the automatic annotation of sentiment classification enables to search only articles with negative, neutral or positive words and phrases. Users can also search by a specific opinion on Brexit (agreement or disagreement about the exit).
The corpus is POS tagged by TreeTagger using Penn Treebank tagset with Sketch Engine modifications.