Command line tools for n-grams

There is a number of utilities available in Finlib/Manatee that…

Text Types, Headers and Subcorpora

Overview When studying a word, phrase, or grammatical construction,…

Preparing Corpus Text

The input format is "vertical" or "word-per-line (WPL)" text,…

czes corpus

CZES is a Czech corpus consisting of newspaper articles and magazine…

Turkic web corpora

There are the following Turkic language family corpora in Sketch…

TalkBank Persian

The TalkBank Persian corpus contains blog posts to various Farsi…

TED_en corpus

A corpus of transcripts of TED talks. Prepared by Akshay Min…

jpTenTen11 LUW corpus

Japanese TenTen corpus gathered from the web in December 2011.…