This corpus has been built using English Wikipedia dump (from second half of September 2014). The XML has been converted using WikiExtractor.py.