The corpus consists of texts collected from available literature/prose from 1985 to 2013. The corpus is composed of texts from five genres: non-fiction prose (45 %) fiction (35 %) newpapers/magazines (10 %), TV subtitles (5 %), and non-standardized, unpublished texts (5 %), all in all 100 mill words.
The corpus is grammatically tagged with the original version of The Oslo-Bergen tagger.
The corpus consists of texts collected from available literature/prose from 1985 to 2013. The corpus is composed of texts from five genres: non-fiction prose (45 %) fiction (35 %) newpapers/magazines (10 %), TV subtitles (5 %), and non-standardized, unpublished texts (5 %), all in all 100 mill words.
The corpus is grammatically tagged with the original version of The Oslo-Bergen tagger.
Extended metadata
resource Common Info:
resource Type: corpus
identification Info:
resource Name: Leksikografisk bokmålskorpus
resource Name: The Lexicographic Corpus for Norwegian Bokmål
description: The corpus consists of texts collected from available literature/prose from 1985 to 2013. The corpus is composed of texts from five genres: non-fiction prose (45 %) fiction (35 %) newpapers/magazines (10 %), TV subtitles (5 %), and non-standardized, unpublished texts (5 %), all in all 100 mill words.
The corpus is grammatically tagged with the original version of The Oslo-Bergen tagger.
description: Korpuset består av tekster hentet fra tilgjengelig litteratur/prosa fra 1985 til 2013. Korpuset har tekster fra fem sjangere: sakprosa (45%) skjønnlitteratur (35%) aviser og periodika (10%), TV-teksting( 5%), og upublisert materiale, småtrykk (5%), alt i alt 100 mill ord.
Korpuset er grammatisk merket med den opprinnelige versjonen av Oslo-Bergen taggeren.
non Standard Conditions Of Use: Due to agreements with the third party copyright holders, the corpus is only available through Glossa, a search and post-processing tool developed by the Text Laboratory.
licensor:
actor Info:
actor Type: organization
organization Info:
organization Name: University of Oslo
organization Name: Universitetet i Oslo
organization Short Name: UiO
organization Short Name: UoO
department Name: Department of Linguistics and Scandinavian Studies
department Name: Institutt for lingvistiske og nordiske studier (ILN)
work Description: The corpus consists of texts collected from available literature/prose from 1985 to 2013. The corpus is composed of texts from five genres: non-fiction prose (45 %) fiction (35 %) newpapers/magazines (10 %), TV subtitles (5 %), and non-standardized, unpublished texts (5 %), all in all 100 mill words.
target Resource Name U R I: The Oslo-Bergen Tagger: http://tekstlab.uio.no/obt-ny/english/index.html
classification Info:
genre Info:
genre Type: textGenre
genre: factual prose
genre Info:
genre Type: textGenre
genre: fiction and drama
genre Info:
genre Type: textGenre
genre: newspaper and magazines
genre Info:
genre Type: textGenre
genre: unstandardised
time Coverage Info:
time Coverage: 1985 – 2013
dc:type
corpus
dc:title
The Lexicographic Corpus for Norwegian Bokmål
dc:identifier
oai:tekstlab.uio.no:LBK2013
dc:description
The corpus consists of texts collected from available literature/prose from 1985 to 2013. The corpus is composed of texts from five genres: non-fiction prose (45 %) fiction (35 %) newpapers/magazines (10 %), TV subtitles (5 %), and non-standardized, unpublished texts (5 %), all in all 100 mill words.
The corpus is grammatically tagged with the original version of The Oslo-Bergen tagger.