TurkishWaC corpus

The TurkishWaC corpus is a 32 million word collection of samples…

UKWaCsst corpus

UKWaC tagged with SuperSenseTagger (​sst-light) described in…

GujarathiWaC corpus

FrWac web as corpus is a corpus of Gujarati language (Indo-Aryan…

GeorgianWaC corpus

Original file owner: bharat.

FrWaC corpus

FrWac web as corpus crawled to the .fr domain and tagged with…

FinnishWaC corpus

Finnish web as corpus.

FrisianWaC corpus

Frisian web as corpus was crawled in August 2013. It is a corpus…

danishWaC corpus

The corpus prepared by Corpus factory method. It has 288 million…

Filipino web corpus (FilipinoWaC)

The corpus was created by Anil in October 2013. It has almost…

German Web Corpus (DeWaC)

The corpus was prepared by Marco Baroni in a web crawl as described…

Arabic web corpus (WaC)

Arabic web corpus was created by Serge Sharoff and was tagged…

Bosnian/Croatian/Serbian WaC

Bosnian, Croatian, Serbian corpora obtained from the web by Nikola…

BengaliWaC corpus

Bengali web corpus was created with Corpus Factory method. The…

Cantonese web corpus (WaC)

This corpus is collected using Cantonese only seed words and…

Basque Web Corpus (WaC)

The Basque "Web as Corpus" corpus was created by Mr. Igor Leturia…