Pages

Web corpora (TenTen corpora)

TenTen is a new generation of Web corpora. These corpora are…

zhTenTen corpus

Simplified Chinese TenTen corpus was created from the Internet…

yoTenTen corpus

Yoruba TenTen web corpus. The corpus is cleaned by jusText,…

uaTenTen corpus

Ukrainian TenTen corpus was crawled by SpiderLing in 2014.…

trTenTen corpus

Turkish TenTen corpus. Crawled by SpiderLing in December 2011…

svTenTen corpus

Swedish TenTen web corpus. The corpus is cleaned by jusText,…

skTenTen corpus

Slovak TenTen corpus. The corpus has been tagged by the ​Ľ.…

ruTenTen corpus

Russian TenTen corpus. Russian web corpus crawled by SpiderLing…

ptTenTen corpus

Portuguese TenTen corpus. The corpus is processed with Eckhard…

plTenTen corpus

Polish TenTen web corpus was crawled by a web spider SpiderLing…

noTenTen corpus

Norwegian TenTen corpus. The corpus is tagged with ​Oslo-Bergen…

nlTenTen corpus

Dutch TenTen web corpus. The corpus is cleaned by jusText,…

lvTenTen corpus

Latvian TenTen corpus was crawled by SpiderLing in April 2014.…

ltTenTen corpus

Lithuanian TenTen corpus. The corpus has not been tagged yet. Structural…

koTenTen corpus

Korean TenTen corpus crawled by SpiderLing in August & September…