Text  28.10.2021

N-grams from NBdigital 2021

This resource contains n-grams - i.e. unigrams, bigrams and trigrams - from all books and newspapers that had been digitized at the National Library of Norway up to July 2021. The n-grams have been …

  • Language: Norwegian Bokmål, Norwegian Nynorsk, Northern Sami, Southern Sami, Lule Sami, Kven
  • Origin: Language Bank
  • Licence: Creative_Commons-ZERO (CC-ZERO)
Lexicon  28.09.2021

ONOMASTICA Pronunciation Lexicon 2

ONOMASTICA Version 2 is an updated version of the original ONOMASTICA Pronunciation Lexicon. To make the lexicon more accessible, Språkbanken has parsed the original .on-files, and generated a …

  • Language: Norwegian
  • Origin: Language Bank
  • Licence: Creative_Commons-BY (CC-BY)
Text  09.09.2021

Translation memories from Nynorsk News Press Agency

These translation memories contain translations of news text from Norwegian Bokmål to Norwegian Nynorsk. The texts are produced by the Norwegian News Agency (https://www.ntb.no/about-ntb), and …

  • Language: Norwegian Bokmål, Norwegian Nynorsk
  • Origin: Language Bank
  • Licence: Creative_Commons-ZERO (CC-ZERO)
Speech  09.09.2021

Norwegian Parliamentary Speech Corpus

This is the stable version (version 1.0) of The Norwegian Parliamentary Speech Corpus (NPSC). The corpus is developed by the Norwegian Language Bank at the National Library of Norway from 2019-2021. …

  • Language: Norwegian
  • Origin: Language Bank
  • Licence: Creative_Commons-ZERO (CC-ZERO)
Text  18.08.2021

Translation Memories from EFTA

These translation memories have been made by the EEA Coordination Division at the European Free Trade Association (EFTA) secretariat in Brussels, where they are used on a daily basis as a work tool in …

  • Language: English, Norwegian Bokmål, Norwegian Nynorsk
  • Origin: Language Bank
  • Licence: Creative_Commons-ZERO (CC-ZERO)
Text  29.06.2021

The LIA Treebank

The LIA Treebank includes 5250 speech segments and 55 410 tokens from the speech corpus LIA Norwegian. The treebank is annotated with morphological and dependency-style syntactic analysis and manually …

  • Language: Norwegian, Norwegian Nynorsk
  • Origin: CLARINO Text Laboratory Centre
  • Licence: Creative_Commons-BY-NC-SA (CC-BY-NC-SA)
Speech  14.06.2021

NST Danish ASR Database (16 kHz) – reorganized

This database was created by Nordic Language Technology for the development of automatic speech recognition and dictation in Danish. In this updated version, the organization of the data have been …

  • Language: Danish
  • Origin: Language Bank
  • Licence: Creative_Commons-ZERO (CC-ZERO)
Speech, Text  04.05.2021

TAUS – The spoken language investigation in Oslo

The material from TAUS (The spoken language investigation in Oslo) is based on informal interviews with people from Oslo. The interviews were made in 1971-73. The informants are mainly from two …

  • Language: Norwegian, Norwegian Bokmål
  • Origin: CLARINO Text Laboratory Centre
  • Licence: CLARIN_ACA-NC-LOC-PRIV-ND-*
Text  04.05.2021

TAUS – downloadable transcriptions

TAUS (The spoken language investigation in Oslo) v.3 is a speech corpus with 86 speakers and 387 551 tokens. The downloadable version of the corpus contains the transcriptions, approx. 387 500 tokens, …

  • Language: Norwegian, Norwegian Bokmål
  • Origin: CLARINO Text Laboratory Centre
  • Licence: Creative_Commons-BY-NC-SA (CC-BY-NC-SA)
Text  30.04.2021

Målfrid 2021 – Freely Available Documents from Norwegian State Institutions

This corpus consists of documents from 339 domains of Norwegian state institutions and comprises totally approx. 4 billion tokens, which makes it one of the largest freely available resources for …

  • Language: Norwegian Bokmål, Norwegian Nynorsk, Northern Sami, Southern Sami, Lule Sami, English
  • Origin: Language Bank
  • Licence: Norwegian Licence for Open Government Data (NLOD)