Portuguese TenTen corpus.

The corpus is processed with Eckhard Bick’s PALAVRAS parser, post-processed to optimise word sketch output by Pete Whitelock at Oxford University Press. The corpus preparation process is described in


24 March 2011

  • initial version – 0.9 billion tokens

August 2012

  • finished version – 3.2 billion tokens processed with Palavras

19 December 2013