NoTa-Oslo is a speech corpus with interviews and conversations from 166 informants born and raised in Oslo and the Oslo area. The informants are carefully selected w.r.t. sociolinguistic variables and therefore representative in terms of age, gender, place of residence and education. NoTa-Oslo consists of approx. 957 000 words that are orthographically transcribed and morphologically tagged. The corpus is searchable in a specially designed search interface, and the transcriptions are linked to audio and video files.
NoTa-Oslo is a speech corpus with interviews and conversations from 166 informants born and raised in Oslo and the Oslo area. The informants are carefully selected w.r.t. sociolinguistic variables and therefore representative in terms of age, gender, place of residence and education. NoTa-Oslo consists of approx. 957 000 words that are orthographically transcribed and morphologically tagged. The corpus is searchable in a specially designed search interface, and the transcriptions are linked to audio and video files.
Extended metadata
resource Common Info:
resource Type: corpus
identification Info:
resource Name: Norsk talespråkskorpus – Oslodelen
description: NoTa-Oslo is a speech corpus with interviews and conversations from 166 informants born and raised in Oslo and the Oslo area. The informants are carefully selected w.r.t. sociolinguistic variables and therefore representative in terms of age, gender, place of residence and education. NoTa-Oslo consists of approx. 957 000 words that are orthographically transcribed and morphologically tagged. The corpus is searchable in a specially designed search interface, and the transcriptions are linked to audio and video files.
description: NoTa-Oslo er et talespråkskorpus bestående av intervjuer og samtaler med 166 informanter født og oppvokst i Oslo og Oslo-området. Informantene er representative med hensyn til alder, kjønn, bosted og utdannelse. NoTa-Oslo består av drøyt 957 000 ord som er ortografisk transkribert og morfologisk tagget. Korpuset er tilgjengelig for forskning og søkbart gjennom søkegrensesnittet Glossa, og transkripsjonene er koblet sammen med lyd- og videofiler.
Transkripsjonene kan også lastes ned.
NoTa-Oslo er laget av Tekstlaboratoriet i perioden 2004 – 2006.
non Standard Conditions Of Use: The corpus has audio and video recordings classified as personal data. In agreement with NSD, the Data Protection Official in Norway, the corpus is accessible only through Glossa, a search and post-processing tool developed by the Text Laboratory.
The video and audio excerpts given by the search interface can not be shown in public unless you have an agreement with the Text Laboratory.
Please note that every individual researcher is responsible for treating the participants in the corpus with respect and sincerity. Furthermore, the participants must be kept anonymous in every published paper or other output.
licensor:
actor Info:
actor Type: organization
organization Info:
organization Name: University of Oslo
organization Name: Universitetet i Oslo
organization Short Name: UiO
organization Short Name: UoO
department Name: Department of Linguistics and Scandinavian Studies
department Name: Institutt for lingvistiske og nordiske studier (ILN)
type Of Video Content: Interviews and conversations from 166 informants born and raised in Oslo and the Oslo area.
text Included In Video: none
dynamic Element Info:
body Parts: arms
body Parts: face
setting Info:
naturality: spontaneous
conversational Type: dialogue
audience: few
interactivity: overlapping
interaction: Two scenarios: one semiformal interview: research assistant and informant. One free conversation between two informants. Research assistants were often passively present in the room during the conversations to prevent conversations about sensitive matters
video Format Info:
mime Type: videos in mpeg4 streaming format available through Glossa
frame Rate: 25
resolution Info:
size Width: 400
size Height: 300
resolution Standard: HD.720
compression Info:
compression: true
compression Name: mpg
corpus Part Info:
media Type: audio
corpus Audio Info:
audio Size Info:
size Info:
size: Approx 40 GB
size Unit: gb
setting Info:
naturality: spontaneous
conversational Type: dialogue
audience: few
interactivity: overlapping
interaction: Two scenarios: one semiformal interview: research assistant and informant. One free conversation between two informants. Research assistants were often passively present in the room during the conversations to prevent conversations about sensitive matters
audio Format Info:
mime Type: wav and mpeg4
signal Encoding: linearPCM
sampling Rate: 32
quantization: 64
number Of Tracks: 1
recording Quality: medium
compression Info:
compression: true
compression Name: mpg
corpus Part General Info:
person Source Set Info:
number Of Persons: 166
age Of Persons: teenager
age Of Persons: adult
age Of Persons: elderly
age Range Start: 16
age Range End: 90
sex Of Persons: mixed
origin Of Persons: native
dialect Accent Of Persons: Oslo dialect: half of the informants come from East Oslo, the other half from West Oslo
geographic Distribution Of Persons: Oslo and close Oslo area
NoTa-Oslo is a speech corpus with interviews and conversations from 166 informants born and raised in Oslo and the Oslo area. The informants are carefully selected w.r.t. sociolinguistic variables and therefore representative in terms of age, gender, place of residence and education. NoTa-Oslo consists of approx. 957 000 words that are orthographically transcribed and morphologically tagged. The corpus is searchable in a specially designed search interface, and the transcriptions are linked to audio and video files.