This Italian corpus was prepared by Marco Baroni in a web crawl as described at EACL 2006 (paper available here).
It was part-of-speech tagged and lemmatised using TreeTagger, an open-source part-of-speech tagger which has been trained for a number of languages.
Italian word sketches were prepared by Marco Baroni and later updated by Valentina Efrati and Francesca Masini ( TRIPLE lab, Roma Tre University).
Sketch Engine offers a range of tools to work with this Italian corpus.
A complete set of Sketch Engine tools is available to work with this Italian itWaC corpus to generate:
Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.
Your 5 days to get up-to-date with the latest developments in corpus-driven lexicography and to activate and enhance your corpus query skills with some of the top experts in the field.
learn more >