The ScienceBlogs corpus is a selection of posts and comments from the website. Dates of publication range from the year 2006 to the beginning of 2014. Posts and related comments share a common and doc.title attribute. The corpus is tagged using TreeTagger with the Penn tagset.

The ScienceBlogs corpus was prepared in 2014 by Akshay Minocha (