ChineseTaiwanWaC corpus

Chinese Taiwan web as corpus has almost 260 million words encoded…

HindiWaC corpus

This corpus contains almost 60 million words crawled from the…

IgboWaC corpus

The corpus is prepared by Corpus factory method and was crawled…

ItWaC Corpus

The corpus was prepared by Marco Baroni in a web crawl as described…

JpWaC corpus

The corpus was prepared by Tomaž Erjavec using a list of URLs…

MalaysianWaC corpus

The corpus is prepared by Corpus factory method. Full details…

NepaliWaC corpus

Nepali web corpus downloaded by LCL on Dec 10, 2014. ~1200…

SamoanWaC corpus

Web corpus of Samoan. Created by Bharat Ram Ambati using corpus…

SetswanaWaC corpus

(version 2) The corpus is prepared by Corpus factory method.…

SpanishWaC corpus

This corpus was gathered using a list of URLs provided by Serge…

SwedishWaC corpus

The corpus is prepared by Corpus factory method. Full details…

TeluguWaC corpus

The corpus is prepared by Corpus factory method. Full details…

SDeWaC corpus

SDeWaC is a subset of DeWaC. The creation of sDeWaC is described…

WelshWaC corpus

The corpus is prepared by Corpus factory method by Anil in October…

ThaiWaC corpus

The corpus is prepared by Corpus factory method. Full details…