Hebrew web corpora

Hebrew General corpus This corpus was crawled from the Internet…

Gaeilge tagset

Parole Common Morphosyntactical Tagset The tables below…

CLAWS tagset

C8 to C7 mapping file. NS 2011-5-14. APPGE -> APPGE: possessive…

Feed Corpus Project

FCP corpus aims to be a million word per day collection of POS-tagged…

OPUS parallel corpora

The parallel corpora available here have been collected, prepared…

Modified Penn Treebank Tagset

from Infogistics' NLProcessor Open class categories POS…

Symbols of Parts of Speech

Simple POS Abbreviation Corresponded symbols in CKIP Interpretation Adjective A A Non-predicative…

BNC (CLAWS-5) Part-of-speech codes

Extracted from the BNC Manual AJ0adjective (general or positive)…

The New Corpus for Ireland | Nua-Chorpas na hÉireann

[ezcol_1half] The New Corpus for Ireland – user’s guide Welcome…

BBC Oxford Children's Stories

(Restricted corpus.) Bibliography Banerji, N., Gupta,…

Oxford Children's Corpus

Journal article Kate Wild, Adam Kilgarriff, and David Tugwell.…

TatarWaC corpus

Tatar sample corpus is ca 200 thousand words crawled from the…

Icelandic sample corpus

This is a small corpus of Icelandic texts prepared for the Sketch…

General instructions on corpus data directory structure

The aims of these instructions is to ensure that for every corpus,…