LatinISE: corpus of Latin

The LatinISE corpus is a text corpus collected from the LacusCurtius, Intratext and Musisque Deoque websites. Corpus texts have rich metadata containing information as genre, title, century or specific date.

This Latin corpus was built by Barbara McGillivray.

Part-of-speech tagset

The texts were lemmatized with Dag Haug’s Latin morphological analyser and  Quick Latin and POS tagged with TreeTagger, trained on the Index Thomisticus Treebank, the Latin Dependency Treebank and the Latin treebank of the Proiel Project.

A complete set of Sketch Engine tools is available to work with this LatinISE corpus to generate:

  • word sketch – Latin collocations categorized by grammatical relations
  • thesaurus – synonyms and similar words for every word
  • word lists – lists of Latin nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context

Changelog

version 2 (October 2014)

  • part-of-speech tagging has been partially corrected (by Barbara McGillivray)
  • text cleaning
  • 10,9 million words

version 1 (2011)

  • initial size 11,3 million words

Bibliography

Barbara McGillivray and Adam Kilgarriff (2012). Tools for historical corpus research, and a corpus of Latin. In New Methods in Historical Corpus Linguistics 3, Germany, 2013, pp. 247–255

Acknowledgements

Bill Thayer (LacusCurtius), Nicola Mastidoro (IntraText), Linda Spinazzè (Musisque Deoque), Dag Haug (Latin morphological analyser and Latin treebank of the PROIEL project), Marco Passarotti (Index Thomisticus Treebank) and Perseus Project (Latin Dependency Treebank).

Search the Latin corpus

Sketch Engine offers a range of tools to work with the Latin Web corpus.

or

Other text corpora in Sketch Engine

Sketch Engine offers 350+ language corpora.

Use Sketch Engine in minutes

Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.