Norwegian TenTen corpus.

The corpus is tagged with Oslo-Bergen Tagger. Tagset reference can be found here.

The tags are available in two corpus attributes:

  • tag – the part of speech
  • tag_attrs – morphological details

For instance, where the original tag is “pron ent pers hum nom 1” (pronoun singular personal human nominative 1):

  • tag = “pron”
  • tag_attrs = “ent pers hum nom 1”

Changelog

v. 1.0 (21 February 2012)

  • initial version – 770 million tokens