Latin corpus built by Barbara McGillivray. The texts have been collected from the LacusCurtius, Intratext and Musisque Deoque websites. The texts have been lemmatised with Dag Haug’s Latin morphological analyser and  Quick Latin; the texts were then part-of-speech tagged with TreeTagger, trained on the Index Thomisticus Treebank, the Latin Dependency Treebank and the Latin treebank of the Proiel Project.

Version 2 October 2014: part-of-speech tagging has been partially corrected (by Barbara McGillivray)


Bill Thayer (LacusCurtius), Nicola Mastidoro (IntraText), Linda Spinazzè (Musisque Deoque), Dag Haug (Latin morphological analyser and Latin treebank of the PROIEL project), Marco Passarotti (Index Thomisticus Treebank) and Perseus Project (Latin Dependency Treebank).


Barbara McGillivray and Adam Kilgarriff (2012). Tools for historical corpus research, and a corpus of Latin. In New Methods in Historical Corpus Linguistics 3, Germany, 2013, pp. 247–255