Search the Spanish esTenTen corpus
esTenTen is a Spanish TenTen corpus. The source data was crawled from the web in 2011. Therefore documents mostly from 2011 and the preceding years.
The data was cleaned (re-encoded to UTF-8, boilerplate removal applied, de-duplicated) and tokenised using Corpus tools. Part-of-speech tagging and lemmatisation were performed using Freeling 3.1 with Spanish configuration & data applying Spanish Freeling tagset.
The corpus consists of two subcorpora: European Spanish and American Spanish downloaded from web domains in the respective continents. Thus a subcorpus effectively determines the language variety. Select the desired subcorpus in the corpus query interface to limit the query to a single Spanish variety.