Tuva Speech Database

Tuva Speech Database was created by Max Manus AS for testing and evaluation of the speech recognition solution “Tuva” for Norwegian.

The corpus consists of approximately 24 hours of recorded speech from 40 speakers of Norwegian, 36 of which speak a dialect close to the Bokmål written standard, while four speak a dialect that is closer to the Nynorsk written standard. About 70% of the material is manuscript read. The remaining 30% is spontaneous speech. The manuscripts in the manuscript read part of the corpus is for the most part composed of short news articles. 25% of the manuscripts are read by all speakers, while the remaining 75% are unique to each speaker.

All punctuation (dots, commas, paragraphs etc.) are read by the speakers, and all sound recordings are orthographically transcribed in two different formats.

For Nynorsk, only manuscript read speech is available. The speakers have been selected to represent a cross section of the Norwegian working population, balanced for age, gender and dialect.

All recordings are made with a 48 kHz sampling frequency and 32 bit resolution with one microphone in one channel (mono).

The recordings were conducted in a recording studio in Oslo.

Tuva Speech Database was created by Max Manus AS for testing and evaluation of the speech recognition solution “Tuva” for Norwegian.

All punctuation (dots, commas, paragraphs etc.) are read by the speakers, and all sound recordings are orthographically transcribed in two different formats.

For Nynorsk, only manuscript read speech is available. The speakers have been selected to represent a cross section of the Norwegian working population, balanced for age, gender and dialect.

All recordings are made with a 48 kHz sampling frequency and 32 bit resolution with one microphone in one channel (mono).

The recordings were conducted in a recording studio in Oslo.

Extended metadata

resource Common Info:
resource Type: corpus
identification Info:
resource Name: Tuva Speech Database
resource Name: Tuva Taledatabase
description: Tuva Speech Database was created by Max Manus AS for testing and evaluation of the speech recognition solution "Tuva" for Norwegian. The corpus consists of approximately 24 hours of recorded speech from 40 speakers of Norwegian, 36 of which speak a dialect close to the Bokmål written standard, while four speak a dialect that is closer to the Nynorsk written standard. About 70% of the material is manuscript read. The remaining 30% is spontaneous speech. The manuscripts in the manuscript read part of the corpus is for the most part composed of short news articles. 25% of the manuscripts are read by all speakers, while the remaining 75% are unique to each speaker. All punctuation (dots, commas, paragraphs etc.) are read by the speakers, and all sound recordings are orthographically transcribed in two different formats. For Nynorsk, only manuscript read speech is available. The speakers have been selected to represent a cross section of the Norwegian working population, balanced for age, gender and dialect. All recordings are made with a 48 kHz sampling frequency and 32 bit resolution with one microphone in one channel (mono). The recordings were conducted in a recording studio in Oslo.
description: Tuva Taledatabase er utarbeidd av Max Manus AS for test og evaluering av dikteringsløysinga «Tuva». Databasen inneheld omlag 24 timar innlesen tale frå 40 talarar. 36 av desse snakkar ei bokmålsnær dialekt, fire ei nynorsknær dialekt. Omlag 70% av materialet er manuskriptlesen tale og 30% er spontan tale. Manuskripta i den manuskriptlesne delen av korpuset er som regel korte avisartiklar. Av desse manuskripta vert 25% lesne av alle talarane, medan dei resterande 75% er unike for kvar talar. All punktuering (punktum, komma, avsnitt osb.) vert lesen opp av innlesarane, og alle lydopptaka er ortografisk transkriberte i to ulike format. For nynorsk finst det berre manuskriptlesen tale i korpuset. Innlesarane i Tuva Taledatabase har vorte utvalde for å representere eit tverrsnitt av den norske arbeidsbefolkninga, balansert for alder, kjønn og dialekt. Alle lydopptaka er utførde med 48 kHz punktprøvingsfrekvens og 32 bit oppløysing med ein mikrofon i ein kanal (mono). Opptaka vart gjennomførte i eit opptaksstudio i Oslo.
url: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-44/
P I D: hdl:21.11146/44
identifier: sbr-44
distribution Info:
licence Info:
user Category: Public
distribution Access Medium: downloadable
download Location: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-44/
licence:
licence Family: Creative Commons (CC)
licence Name: Creative_Commons-BY (CC-BY)
licence Url: https://creativecommons.org/licenses/by/4.0/
conditions Of Use: BY
licensor:
actor Info:
actor Type: organization
role: Licensor
organization Info:
organization Name: National Library of Norway
organization Name: Nasjonalbiblioteket
organization Short Name: NLN
organization Short Name: NB
department Name: The Language Bank
department Name: Språkbanken
communication Info:
email: sprakbanken@nb.no
url: https://www.nb.no/sprakbanken/
address: P.O. Box 2674 Solli
zip Code: 0203
city: Oslo
region: Oslo
country: Norway
distribution Rights Holder
- actor Info:
- actor Type: organization
- role: Distribution Rights Holder
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info:
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
contact
- actor Info:
- actor Type: organization
- role: Contact
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info:
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
metadata Info:
metadata Creation Date: 14.03.2018
metadata Language Name: English
metadata Language Id: en
metadata Last Date Updated: 07.08.2023
metadata Creator
- actor Info:
- actor Type: person
- role: Metadata Creator
- person Info:
- surname: Lindstad
- given Name: Arne Martinus
- affiliation:
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info:
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- actor Info:
- actor Type: person
- role: Metadata Creator
- person Info:
- surname: Johnsen
- given Name: Lars
- affiliation:
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info:
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
version Info:
version: 1.0
last Date Updated: 01.06.2017
validation Info:
validated: false
resource Documentation Info:
documentation Unstructured:
role: documentation
document Unstructured: See the documentation file. Documentation includes an overview of the structure of the speech database and brief descriptions of the text material, readers, recording procedure as well as information about how the resource is annotated.
resource Creation Info:
creation Start Date: 01.01.2016
creation End Date: 01.06.2017
resource Creator
- actor Info:
- actor Type: organization
- role: Resource Creator
- organization Info:
- organization Name: Max Manus AS
- organization Name: Max Manus AS

Download resources

tuva.tar.gz

Download metadata

Download metadata https://www.nb.no/sprakbanken/oai?verb=GetRecord&identifier=oai:nb.no:sbr-44&metadataPrefix=cmdi

dc:type	corpus
dc:title	Tuva Speech Database
dc:identifier	oai:nb.no:sbr-44
dc:description	Tuva Speech Database was created by Max Manus AS for testing and evaluation of the speech recognition solution "Tuva" for Norwegian. The corpus consists of approximately 24 hours of recorded speech from 40 speakers of Norwegian, 36 of which speak a dialect close to the Bokmål written standard, while four speak a dialect that is closer to the Nynorsk written standard. About 70% of the material is manuscript read. The remaining 30% is spontaneous speech. The manuscripts in the manuscript read part of the corpus is for the most part composed of short news articles. 25% of the manuscripts are read by all speakers, while the remaining 75% are unique to each speaker. All punctuation (dots, commas, paragraphs etc.) are read by the speakers, and all sound recordings are orthographically transcribed in two different formats. For Nynorsk, only manuscript read speech is available. The speakers have been selected to represent a cross section of the Norwegian working population, balanced for age, gender and dialect. All recordings are made with a 48 kHz sampling frequency and 32 bit resolution with one microphone in one channel (mono). The recordings were conducted in a recording studio in Oslo.
dc:publisher
dc:format	downloadable
dc:date	2016-01-01
dc:date	2017-06-01
dc:rights	Public
dc:rights	Creative Commons (CC)
dc:rights	Creative_Commons-BY (CC-BY)
dc:rights	https://creativecommons.org/licenses/by/4.0/
dc:creator	Max Manus AS
dc:lang	Norwegian

Tuva Speech Database

Extended metadata

Resource Common Info

Corpus Info

Dublin Core (DC)

Download resources

Download metadata