Internet-ZH corpus

Internet-ZH is a Chinese web corpus collected by Serge Sharoff.…

Project Gutenberg Corpus

downloaded with wget: getting Gutenberg cleaned with…

Fryske Akademy Parallel Corpus

Frisian and Dutch not POS tagged aligned sentences Dutch…

ChineseTaiwanWaC corpus

Chinese Taiwan web as corpus has almost 260 million words encoded…