Norwegian Parliamentary Speech Corpus 2.0

This is version 2.0 of The Norwegian Parliamentary Speech Corpus (NPSC). In version 2.0, a number of changes have been made to the transcriptions, and some identified errors in the corpus have been corrected. The changes are described in detail in the documentation. (Version 1.1 is still available, type “sbr-58” in the search box.)

The corpus has been developed by the Norwegian Language Bank at the National Library of Norway from 2019-2021. The NPSC consists of audio recordings of meetings in Stortinget (the Norwegian parliament), with corresponding orthographic transcriptions in either Norwegian Bokmål or Norwegian Nynorsk, as well as various metadata about the speakers. The official proceedings from the meetings are also included in the corpus for reference. The recordings add up to 140 hours of running speech (including pauses) from 267 unique speakers, and contain 65,000 sentences and 1.2 million words in total.

Transcription was first done automatically; subsequently, the output of the automatic process was manually checked and corrected by trained linguists and philologists. Finally, all transcriptions were proofread to ensure consistency and accuracy.

NPSC is primarily intended as an open-source dataset for ASR development.

The individual audio files in the corpus contain the speech of entire days of plenary meetings from 2017 and 2018 (or, if a meeting lasts more than six hours, the first six hours of the meeting). Since the audio files are quite large, individual audio files for each sentence are also included.

We greatly appreciate any feedback and suggestions for improvement. Please use our e-mail address, sprakbanken@nb.no.

NPSC is primarily intended as an open-source dataset for ASR development.

We greatly appreciate any feedback and suggestions for improvement. Please use our e-mail address, sprakbanken@nb.no.

Download resources

Extended metadata

dc:type	corpus
dc:title	Norwegian Parliamentary Speech Corpus 2.0
dc:identifier	oai:nb.no:sbr-84
dc:description	This is version 2.0 of The Norwegian Parliamentary Speech Corpus (NPSC). In version 2.0, a number of changes have been made to the transcriptions, and some identified errors in the corpus have been corrected. The changes are described in detail in the documentation. (Version 1.1 is still available, type "sbr-58" in the search box.) The corpus has been developed by the Norwegian Language Bank at the National Library of Norway from 2019-2021. The NPSC consists of audio recordings of meetings in Stortinget (the Norwegian parliament), with corresponding orthographic transcriptions in either Norwegian Bokmål or Norwegian Nynorsk, as well as various metadata about the speakers. The official proceedings from the meetings are also included in the corpus for reference. The recordings add up to 140 hours of running speech (including pauses) from 267 unique speakers, and contain 65,000 sentences and 1.2 million words in total. Transcription was first done automatically; subsequently, the output of the automatic process was manually checked and corrected by trained linguists and philologists. Finally, all transcriptions were proofread to ensure consistency and accuracy. NPSC is primarily intended as an open-source dataset for ASR development. The individual audio files in the corpus contain the speech of entire days of plenary meetings from 2017 and 2018 (or, if a meeting lasts more than six hours, the first six hours of the meeting). Since the audio files are quite large, individual audio files for each sentence are also included. We greatly appreciate any feedback and suggestions for improvement. Please use our e-mail address, sprakbanken@nb.no.
dc:publisher
dc:format	downloadable
dc:date	2019-08-01
dc:date	2023-07-13
dc:rights	Public
dc:rights	Creative Commons (CC)
dc:rights	Creative_Commons-ZERO (CC-ZERO)
dc:rights	https://creativecommons.org/publicdomain/zero/1.0/
dc:creator	National Library of Norway
dc:lang	Norwegian

Last ned metadata (CMDI XML)

Last ned metadata (CMDI XML) https://www.nb.no/sprakbanken/oai?verb=GetRecord&identifier=oai:nb.no:sbr-84&metadataPrefix=cmdi

Norwegian Parliamentary Speech Corpus 2.0

Download resources

Extended metadata

Dublin Core (DC)

Last ned metadata (CMDI XML)