The Lexicographic Corpus for Norwegian Bokmål

Clarino - Textlab


Oppdatert: 2017-08-14

The corpus consists of texts collected from available literature/prose from 1985 to 2013. The corpus is composed of texts from five genres: non-fiction prose (45 %) fiction (35 %) newpapers/magazines (10 %), TV subtitles (5 %), and non-standardized, unpublished texts (5 %), all in all 100 mill words.

The corpus is grammatically tagged with the original version of The Oslo-Bergen tagger.

