Corpora are a good starting point as collection history texts. You can have all your data in one corpus with the help of WebBootCat.

Historical corpora:

Sketch Engine is also being used in the ChartEx project ( which is applying text mining methods to medieval Latin charters. It will make the corpora it prepares publicly available through Sketch Engine as the project proceeds.


