NST Norwegian ASR Database (16 kHz) – Reorganized

This database was created by Nordic Language Technology for the development of automatic speech recognition and dictation in Norwegian. In this version (from 2022), the organization of the data has been altered to make the database more user friendly.

The data set was reviewed again in December 2023; duplicates were removed, metadata was cleaned up and filenames were standardized. See the documentation file for details.

In the original version of the material, the files were organized in a specific folder structure where the folder names were meaningful. However, the file names were not meaningful, and there were also cases of files with identical names in different folders. This proved to be impractical, since users had to keep the original folder structure in order to use the data.

The files have been renamed, such that each file name is unique and meaningful regardless of the folder structure. The original metadata files were in spl format; these have been converted to JSON format. The metadata files are anonymized, and the text encoding has been converted from ANSI to UTF-8. Metadata and transcriptions are also available as CSV files.

See the documentation file for a full description of the data and the changes that have been made to the database.

The data set was reviewed again in December 2023; duplicates were removed, metadata was cleaned up and filenames were standardized. See the documentation file for details.

See the documentation file for a full description of the data and the changes that have been made to the database.

Download resources

Extended metadata

Last ned metadata (CMDI XML)

Last ned metadata (CMDI XML) https://www.nb.no/sprakbanken/oai?verb=GetRecord&identifier=oai:nb.no:sbr-54&metadataPrefix=cmdi

dc:type	corpus
dc:title	NST Norwegian ASR Database (16 kHz) – Reorganized
dc:identifier	oai:nb.no:sbr-54
dc:description	This database was created by Nordic Language Technology for the development of automatic speech recognition and dictation in Norwegian. In this version (from 2022), the organization of the data has been altered to make the database more user friendly. The data set was reviewed again in December 2023; duplicates were removed, metadata was cleaned up and filenames were standardized. See the documentation file for details. In the original version of the material, the files were organized in a specific folder structure where the folder names were meaningful. However, the file names were not meaningful, and there were also cases of files with identical names in different folders. This proved to be impractical, since users had to keep the original folder structure in order to use the data. The files have been renamed, such that each file name is unique and meaningful regardless of the folder structure. The original metadata files were in spl format; these have been converted to JSON format. The metadata files are anonymized, and the text encoding has been converted from ANSI to UTF-8. Metadata and transcriptions are also available as CSV files. See the documentation file for a full description of the data and the changes that have been made to the database.
dc:publisher
dc:format	downloadable
dc:date	1998-01-05
dc:date	2023-12-19
dc:rights	Public
dc:rights	Creative Commons (CC)
dc:rights	Creative_Commons-ZERO (CC-ZERO)
dc:rights	https://creativecommons.org/publicdomain/zero/1.0/
dc:creator	Nordic Language Technology AS
dc:creator	Lars Johnsen
dc:creator	Per Erik Solberg
dc:creator	Lars Magne Tungland
dc:lang	Norwegian

NST Norwegian ASR Database (16 kHz) – Reorganized

Download resources

Extended metadata

Dublin Core (DC)

Last ned metadata (CMDI XML)