Internet-ZH corpus

Internet-ZH is a Chinese web corpus collected by Serge Sharoff.…

Project Gutenberg Corpus

downloaded with wget: getting Gutenberg cleaned with…

Fryske Akademy Parallel Corpus

Frisian and Dutch not POS tagged aligned sentences Dutch…

French Web Corpus (WaC)

This corpus (web as corpus) was gathered using a list of URLs…

Estonian Reference Corpus

Estonian Reference Corpus is a morphologically annotated corpus…

ChineseTaiwanWaC corpus

Chinese Taiwan web as corpus has almost 260 million words encoded…