N-gram – Norwegian Bokmål News Text

This corpus contains n-grams in Norwegian Bokmål derived from the Norwegian Newspaper Corpus. The source data for the corpus is 665 million words of running text harvested from Norwegian news sources on the web (1998-2011). Sequences of one to six words have been generated (i.e., unigrams, bigrams, trigrams, 4-grams, 5-grams and 6-grams) and ordered by frequency. This work was done by Uni Research on behalf of the National Library and the Language Bank

For convenience, a collection of the 1000 most frequent n-grams of all types listed above is also made available as a separate download.

Download resources

Extended metadata

Last ned metadata (CMDI XML)

Last ned metadata (CMDI XML) https://www.nb.no/sprakbanken/oai?verb=GetRecord&identifier=oai:nb.no:sbr-29&metadataPrefix=cmdi

dc:type	corpus
dc:title	N-gram – Norwegian Bokmål News Text
dc:identifier	oai:nb.no:sbr-29
dc:description	This corpus contains n-grams in Norwegian Bokmål derived from the Norwegian Newspaper Corpus. The source data for the corpus is 665 million words of running text harvested from Norwegian news sources on the web (1998-2011). Sequences of one to six words have been generated (i.e., unigrams, bigrams, trigrams, 4-grams, 5-grams and 6-grams) and ordered by frequency. This work was done by Uni Research on behalf of the National Library and the Language Bank For convenience, a collection of the 1000 most frequent n-grams of all types listed above is also made available as a separate download.
dc:publisher
dc:format	downloadable
dc:date	2011-01-03
dc:date	2011-12-22
dc:rights	Public
dc:rights	Creative Commons (CC)
dc:rights	Creative_Commons-ZERO (CC-ZERO)
dc:rights	https://creativecommons.org/publicdomain/zero/1.0/
dc:creator	Knut Hofland
dc:lang	Norwegian Bokmål

N-gram – Norwegian Bokmål News Text

Download resources

Extended metadata

Dublin Core (DC)

Last ned metadata (CMDI XML)