NST Norwegian ASR Database (16 kHz) – Reorganized
Extended metadata
- resource Common Info
- resource Type: corpus
- identification Info
- resource Name: NST norsk ATG-database (16 kHz) – reorganisert
- resource Name: NST Norwegian ASR Database (16 kHz) – Reorganized
- description: Denne databasen er laget av Nordisk språkteknologi som datagrunnlag for talegjenkjenning og diktering på norsk. I denne versjonen er dataene strukturert på nytt, slik at databasen enklere kommer til anvendelse. I den opprinnelige versjonen av materialet var filene organisert i en bestemt mappestruktur der mappenavnene var meningsbærende. Filnavnene var imidlertid ikke meningsbærende, og det fantes også tilfeller av filer med samme navn i forskjellige mapper. Dette viste seg å være upraktisk, siden brukere måtte beholde den opprinnelige mappestrukturen for å kunne benytte seg av dataene. Filene er derfor navngitt på nytt, slik at filnavnene er unike og meningsbærende uavhengig av mappestrukturen. De opprinnelige metadatafilene var i spl-format. Disse er konvertert til JSON-format. De konverterte metadatafilene er også anonymisert, og tekstkodingen er UTF-8 istedenfor ANSI, som det opprinnelige materialet hadde. Metadata og transkripsjoner foreligger også som CSV-filer. Se dokumentasjonsfilen for en full beskrivelse av dataene og endringene som er gjort.
- description: This database was created by Nordic Language Technology for the development of automatic speech recognition and dictation in Norwegian. In this version, the organization of the data have been altered to improve the usefulness of the database. In the original version of the material, the files were organized in a specific folder structure where the folder names were meaningful. However, the file names were not meaningful, and there were also cases of files with identical names in different folders. This proved to be impractical, since users had to keep the original folder structure in order to use the data. The files have been renamed, such that the file names are unique and meaningful regardless of the folder structure. The original metadata files were in spl format. These have been converted to JSON format. The converted metadata files are also anonymized and the text encoding has been converted from ANSI to UTF-8. Metadata and transcriptions are also available as CSV files. See the documentation file for a full description of the data and the changes made to the database.
- url: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-54/
- P I D: hdl:21.11146/54
- identifier: sbr-54
- distribution Info
- licence Info
- user Category: Public
- distribution Access Medium: downloadable
- download Location: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-54/
- licence
- licence Family: Creative Commons (CC)
- licence Name: Creative_Commons-ZERO (CC-ZERO)
- licence Url: https://creativecommons.org/publicdomain/zero/1.0/
- licensor:
- actor Info
- actor Type: organization
- role: Licensor
- organization Info
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- distribution Rights Holder
- actor Info
- actor Type: organization
- role: Distribution Rights Holder
- organization Info
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- actor Info
- licence Info
- contact
- actor Info
- actor Type: organization
- role: Contact
- organization Info
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- actor Info
- metadata Info
- metadata Creation Date: 21.10.2020
- metadata Language Name: English
- metadata Language Id: en
- metadata Last Date Updated: 13.04.2023
- metadata Creator
- actor Info
- actor Type: person
- role: Metadata Creator
- person Info
- surname: Lindstad
- given Name: Arne Martinus
- affiliation:
- organization Info
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- actor Info
- version: 2
- revision: https://www.nb.no/sbfil/talegjenkjenning/16kHz_2020/no_2020/no-16khz_reorganized_english.pdf
- last Date Updated: 11.03.2022
- validated: true
- validation Type: content
- validation Mode: mixed
- validation Mode Details: Change of file formats, change of text encoding.
- validation Extent: partial
- validator:
- actor Info
- actor Type: person
- role: Resource Validator
- person Info
- surname: Solberg
- given Name: Per Erik
- affiliation:
- organization Info
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- documentation Unstructured
- role: documentation
- document Unstructured: https://www.nb.no/sbfil/talegjenkjenning/16kHz_2020/no_2020/no-16khz_reorganisert_norsk.pdf
- documentation Unstructured
- role: documentation
- document Unstructured: https://www.nb.no/sbfil/talegjenkjenning/16kHz_2020/no_2020/no-16khz_reorganized_english.pdf
- creation Start Date: 05.01.1998
- creation End Date: 11.03.2022
- resource Creator
- actor Info
- actor Type: organization
- role: Resource Creator
- organization Info
- organization Name: Nordic Language Technology AS
- organization Name: Nordisk språkteknologi AS
- organization Short Name: NST
- organization Short Name: NST
- actor Info
- actor Type: person
- role: Resource Creator
- person Info
- surname: Johnsen
- given Name: Lars
- affiliation:
- organization Info
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- actor Info
- actor Info
- actor Type: person
- role: Resource Creator
- person Info
- surname: Solberg
- given Name: Per Erik
- affiliation:
- organization Info
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- corpus Info
- corpus Type: Multimodal Corpus
- corpus Part Info
- media Type: audio
- corpus Audio Info
- audio Size Info
- size Info
- size: 171,8
- size Unit: gb
- size Info
- size: 12
- size Unit: files
- size Info
- audio Format Info
- mime Type: audio/wav
- audio Size Info
- corpus Part Info
- media Type: text
- corpus Text Info
- text Format Info
- mime Type: application/json
- size Per Text Format
- size Info
- size: 6,3
- size Unit: mb
- size Info
- size: 3
- size Unit: files
- size Info
- character Encoding Info
- character Encoding: UTF-8
- text Format Info
- corpus Part General Info
- linguality Info
- linguality Type: monolingual
- language Info
- language Id: no
- language Name: Norwegian
- language Variety Info
- language Variety Type: dialect
- language Variety Name: Norwegian dialects
- modality Info
- modality Type: spokenLanguage
- modality Type Details: manuscript read
- size Info
- size: 171,8
- size Unit: gb
- annotation Info
- annotation Type: speechAnnotation-orthographicTranscription
- segmentation Level: word
- annotation Mode: manual
- linguality Info
dc:type | corpus |
dc:title | NST Norwegian ASR Database (16 kHz) – Reorganized |
dc:identifier | oai:nb.no:sbr-54 |
dc:description | This database was created by Nordic Language Technology for the development of automatic speech recognition and dictation in Norwegian. In this version, the organization of the data have been altered to improve the usefulness of the database. In the original version of the material, the files were organized in a specific folder structure where the folder names were meaningful. However, the file names were not meaningful, and there were also cases of files with identical names in different folders. This proved to be impractical, since users had to keep the original folder structure in order to use the data. The files have been renamed, such that the file names are unique and meaningful regardless of the folder structure. The original metadata files were in spl format. These have been converted to JSON format. The converted metadata files are also anonymized and the text encoding has been converted from ANSI to UTF-8. Metadata and transcriptions are also available as CSV files. See the documentation file for a full description of the data and the changes made to the database. |
dc:publisher | |
dc:format | downloadable |
dc:date | 1998-01-05 |
dc:date | 2022-03-11 |
dc:rights | Public |
dc:rights | Creative Commons (CC) |
dc:rights | Creative_Commons-ZERO (CC-ZERO) |
dc:rights | https://creativecommons.org/publicdomain/zero/1.0/ |
dc:creator | Nordic Language Technology AS |
dc:creator | Lars Johnsen |
dc:creator | Per Erik Solberg |
dc:lang | Norwegian |
Download resources
-
ADB_NOR_0463.tar.gz
-
ADB_NOR_0464.tar.gz
-
ADB_OD_Nor.NOR.tar.gz
-
lydfiler_16_1_a.tar.gz
-
lydfiler_16_1_b.tar.gz
-
lydfiler_16_1_c.tar.gz
-
lydfiler_16_1_d.tar.gz
-
lydfiler_16_2_a.tar.gz
-
lydfiler_16_2_b.tar.gz
-
lydfiler_16_2_c.tar.gz
-
lydfiler_16_2_d.tar.gz
-
lydfiler_16_begge_a.tar.gz
-
lydfiler_16_begge_b.tar.gz
-
lydfiler_16_begge_c.tar.gz
-
lydfiler_16_begge_d.tar.gz
-
no-16khz_reorganisert_norsk.pdf
-
no-16khz_reorganized_english.pdf
-
metadata_no_csv.zip