Use Sketch Engine to work with French frWaC web corpus

This French frWaC corpus is part of the web as corpus corpora family. It was crawled to the .fr domain and tagged with the TreeTagger.

A complete set of Sketch Engine tools is available to work with this French frWaC corpus to generate:

  • word sketch – French collocations categorized by grammatical relations
  • thesaurus – synonyms and similar words for every word
  • word lists – lists of French nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context

Search the frWaC corpus in Sketch Engine

Sketch Engine offers a range of tools to work with this French corpus.



version 1.1 (2012/04/13)
  • retagged with UTF-8 TreeTagger models to fix lemmatization
  • improved sentence segmentation
version 1.0

Related paper

Baroni, M., Bernardini, S., Ferraresi, A., & Zanchetta, E. (2009). The WaCky wide web: a collection of very large linguistically processed web-crawled corpora. In Language resources and evaluation, 43(3), pp. 209–226.

Learn to use Sketch Engine in minutes

Generating French collocations, French frequency lists, examples in contexts, n-grams or extracting terms in French is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.