The ScienceBlogs corpus is a selection of posts and comments from the scienceblogs.com website. Dates of publication range from the year 2006 to the beginning of 2014. Posts and related comments share a common doc.id and doc.title attribute. The corpus is tagged using TreeTagger with the Penn tagset.

The ScienceBlogs corpus was prepared in 2014 by Akshay Minocha (akshayminocha5@gmail.com).