This corpus contains n-grams in Norwegian Bokmål derived from the Norwegian Newspaper Corpus. The source data for the corpus is 665 million words of running text harvested from Norwegian news sources on the web (1998-2011). Sequences of one to six words have been generated (i.e., unigrams, bigrams, trigrams, 4-grams, 5-grams and 6-grams) and ordered by frequency. This work was done by Uni Research on behalf of the National Library and the Language Bank
For convenience, a collection of the 1000 most frequent n-grams of all types listed above is also made available as a separate download.
This corpus contains n-grams in Norwegian Bokmål derived from the Norwegian Newspaper Corpus. The source data for the corpus is 665 million words of running text harvested from Norwegian news sources on the web (1998-2011). Sequences of one to six words have been generated (i.e., unigrams, bigrams, trigrams, 4-grams, 5-grams and 6-grams) and ordered by frequency. This work was done by Uni Research on behalf of the National Library and the Language Bank
For convenience, a collection of the 1000 most frequent n-grams of all types listed above is also made available as a separate download.
Extended metadata
resource Common Info:
resource Type: corpus
identification Info:
resource Name: N-gram – Norwegian Bokmål News Text
resource Name: N-gram – nyhetstekst på bokmål
description: This corpus contains n-grams in Norwegian Bokmål derived from the Norwegian Newspaper Corpus. The source data for the corpus is 665 million words of running text harvested from Norwegian news sources on the web (1998-2011). Sequences of one to six words have been generated (i.e., unigrams, bigrams, trigrams, 4-grams, 5-grams and 6-grams) and ordered by frequency. This work was done by Uni Research on behalf of the National Library and the Language Bank
For convenience, a collection of the 1000 most frequent n-grams of all types listed above is also made available as a separate download.
description: Dette korpuset inneholder n-grammer på bokmål, hentet ut fra Norsk aviskorpus. Tekstgrunnlaget for korpuset er 665 millioner ord med løpende tekst høstet fra forskjellige norske nettaviser. Sekvenser av ett til seks ord er generert (unigrammer, bigrammer, trigrammer, 4-grammer, 5-grammer og 6-grammer) og ordnet etter frekvens. Dette arbeidet ble gjort av Uni Research på vegne av Nasjonalbiblioteket og Språkbanken.
For enkelhets skyld ble det også laget en samling med de 1000 mest frekvente n-grammene av alle typer nevnt ovenfor for nedlasting separat..
This corpus contains n-grams in Norwegian Bokmål derived from the Norwegian Newspaper Corpus. The source data for the corpus is 665 million words of running text harvested from Norwegian news sources on the web (1998-2011). Sequences of one to six words have been generated (i.e., unigrams, bigrams, trigrams, 4-grams, 5-grams and 6-grams) and ordered by frequency. This work was done by Uni Research on behalf of the National Library and the Language Bank
For convenience, a collection of the 1000 most frequent n-grams of all types listed above is also made available as a separate download.