FrWac web as corpus crawled to the .fr domain and tagged with the TreeTagger. You can find more information in the paper below.



version 1.1 (2012/04/13)

  • retagged with UTF-8 TreeTagger models to fix lemmatization
  • improved sentence segmentation

version 1.0

Related paper

Baroni, M., Bernardini, S., Ferraresi, A., & Zanchetta, E. (2009). The WaCky wide web: a collection of very large linguistically processed web-crawled corpora. InLanguage resources and evaluation, 43(3), pp. 209–226.