The Oslo Corpus of Tagged Norwegian Texts (the Nynorsk part)

Clarino - Textlab

Lisens: CLARIN_ACA-NC-LOC-ND

Oppdatert: 2017-06-20

The corpus consists of the texts that were available at the Text Laboratory in January 1999. It is composed of texts from three genres: fiction (2.1 mill. words), newpapers/magazines (nynorsk: 1 mill.), and factual prose (700.000), all in all 3.8 mill. words.

All fiction comes from ECI (European Corpus Initiative) and Norsk Tekstarkiv (Norwegian Text Archive). The texts from newspapers and magazines have been collected by the Text Laboratory with kind permission from the various editorial offices. The factual prose consists mainly of NOU reports (Norwegian Official Reports) and Norwegian laws and regulations.

The corpus is not meant to be representative in any sense, although it contains texts from a variety of genres.

The corpus is grammatically tagged with the original version of The Oslo-Bergen tagger.

Vis utvidede metadata

The link will take you to an external site: We take no responsibility whatsoever for the content of external links.