Corpora are a good starting point as collection history texts. You can have all your data in one corpus with the help of WebBootCat.

Historical corpora:

Sketch Engine is also being used in the ChartEx project ( which is applying text mining methods to medieval Latin charters. It will make the corpora it prepares publicly available through Sketch Engine as the project proceeds.


Adam Kilgarriff, Miloš Husák and Robyn Woodrow (2012). The Sketch Engine as infrastructure for historical corpora. In Jeremy Jancsary (ed.). Empirical Methods in Natural Language Processing; Proceedings of the Conference on Natural Language Processing 2012, pp. 351–356

Barbara McGillivray and Adam Kilgarriff (2012). Tools for historical corpus research, and a corpus of Latin (presentation). In New Methods in Historical Corpus Linguistics 3, Germany, 2013, pp. 247–255.