Portuguese TenTen corpus.
The corpus is processed with Eckhard Bick’s PALAVRAS parser, post-processed to optimise word sketch output by Pete Whitelock at Oxford University Press. The corpus preparation process is described in
- Setting up for corpus lexicography
- Adam Kilgarriff, Jan Pomikalek, Miloš Jakubíček, Pete Whitelock
- in: Proc. EURALEX , Oslo, August 2012.
24 March 2011
- initial version – 0.9 billion tokens
- finished version – 3.2 billion tokens processed with Palavras
19 December 2013
- tagged by Freeling
- attribute lempos added