This Italian corpus was prepared by Marco Baroni in a web crawl as described at EACL 2006
It was part-of-speech tagged and lemmatised using TreeTagger, an open-source part-of-speech tagger which has been trained for a number of languages.
Italian word sketches were prepared by Marco Baroni and later updated by Valentina Efrati and Francesca Masini (TRIPLE lab, Roma Tre University).
Sketch Engine offers a range of tools to work with this Italian corpus.
A complete set of Sketch Engine tools is available to work with this Italian itWaC corpus to generate:
BARONI, Marco; KILGARRIFF, Adam. Large linguistically-processed web corpora for multiple languages. In: Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations. Association for Computational Linguistics, 2006, pp. 87–90.
Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.
Your 5 days to get up-to-date with the latest developments in corpus-driven lexicography and to activate and enhance your corpus query skills with some of the top experts in the field.
learn more >