The corpus was created by Anil in October 2013. It has almost 27 million words and is encoded in UTF-8.