LIA Norwegian – Corpus of historical dialect recordings
LIA Norwegian is a speech corpus with old recordings (1939 – 1996) from four Norwegian universities: NTNU, UoB, UoO and UoT. The recordings are mainly made for dialect and onomastic research and the topics of the interviews and conversations are typically about old trades such as agriculture, fisheries, logging and life at the summer farm. Other topics are weaving, knitting, baking or dialects. The recordings are semi-formal or informal and often take place in an informant’s home.
The first version of the corpus have 3.5 million tokens and 1374 speakers from 222 places in Norway.
The corpus is morphologically tagged with a statistical speech tagger for Nynorsk.
LIA Norwegian is a speech corpus with old recordings (1939 – 1996) from four Norwegian universities: NTNU, UoB, UoO and UoT. The recordings are mainly made for dialect and onomastic research and the topics of the interviews and conversations are typically about old trades such as agriculture, fisheries, logging and life at the summer farm. Other topics are weaving, knitting, baking or dialects. The recordings are semi-formal or informal and often take place in an informant’s home.
The first version of the corpus have 3.5 million tokens and 1374 speakers from 222 places in Norway.
The corpus is morphologically tagged with a statistical speech tagger for Nynorsk.
Extended metadata
resource Common Info:
resource Type: corpus
identification Info:
resource Name: LIA norsk – korpus av eldre dialektopptak
resource Name: LIA Norwegian – Corpus of historical dialect recordings
description: LIA Norwegian is a speech corpus with old recordings (1939 – 1996) from four Norwegian universities: NTNU, UoB, UoO and UoT. The recordings are mainly made for dialect and onomastic research and the topics of the interviews and conversations are typically about old trades such as agriculture, fisheries, logging and life at the summer farm. Other topics are weaving, knitting, baking or dialects. The recordings are semi-formal or informal and often take place in an informant’s home.
The first version of the corpus have 3.5 million tokens and 1374 speakers from 222 places in Norway.
The corpus is morphologically tagged with a statistical speech tagger for Nynorsk.
description: LIA norsk er et talespråkskorpus med gamle opptak (1939 – 1996) fra fire norske universitet: NTNU, UiB, UiO og UiT. Opptakene er gjort for dialektforskning og navneforskning, og handler ofte om landbruk, skogbruk, fiske, livet på setra og gamle håndverkstradisjoner. Som regel er opptakene gjort i private hjem, og intervjuene og samtalene er ganske uformelle.
Den første versjonen av korpuset inneholder 3.5 millioner tokens og 1374 talere fra 222 steder i Norge. Korpuset er morfologisk tagget med en nyutviklet, statistisk talemålstagger for Nynorsk.
non Standard Conditions Of Use: The corpus has audio and video recordings classified as personal data. In agreement with NSD, the Data Protection Official in Norway, the corpus is accessible only through Glossa, a search and post-processing tool developed by the Text Laboratory.
The audio excerpts given by the search interface can not be shown in public unless you have an agreement with the Text Laboratory.
Please note that every individual researcher is responsible for treating the participants in the corpus with respect and sincerity. Furthermore, the participants must be kept anonymous in every published paper or other output.
licensor:
actor Info:
actor Type: organization
organization Info:
organization Name: University of Oslo
organization Name: Universitetet i Oslo
organization Short Name: UiO
organization Short Name: UoO
department Name: Department of Linguistics and Scandinavian Studies
department Name: Institutt for lingvistiske og nordiske studier (ILN)
document Unstructured: Brukarrettleiing for LIA norsk – korpus av eldre dialektopptak: http://tekstlab.uio.no/brukerveiledninger/LIA%20norsk/index.html
documentation Structured:
role: documentation
document Info:
document Type: other
title: Heimesida til LIA-korpuset for norske dialekter
interaction: Semiformal or informal interviews with one or more interviewers. Often the recordings are more like conversations. The recordings are mostly from peoples homes.
audio Format Info:
mime Type: wav and mp3
recording Quality: medium
compression Info:
compression: true
compression Name: mp3
corpus Part General Info:
person Source Set Info:
number Of Persons: 1374
age Of Persons: teenager
age Of Persons: adult
age Of Persons: elderly
age Range Start: 10
age Range End: 99
sex Of Persons: mixed
origin Of Persons: native
dialect Accent Of Persons: Dialects from 222 places in Norway
geographic Distribution Of Persons: All over Norway
linguality Info:
linguality Type: monolingual
language Info:
language Id: No
language Name: Norwegian
language Info:
language Id: Nn
language Name: Norwegian Nynorsk
modality Info:
modality Type: spokenLanguage
modality Type Details: Two annotation modes: Norwegian dialects. One phonetic (with Norwegian alphabet) and one orthographic.
unstandardised Genre: conversations and informal interviews
classification Info:
genre Info:
genre Type: speechGenre
genre: semi formal
unstandardised Genre: interviews
time Coverage Info:
time Coverage: 1951 – 1995
geographic Coverage Info:
geographic Coverage: All over Norway
recording Info:
recording Device Type: other
recording Environment: other
dc:type
corpus
dc:title
LIA Norwegian – Corpus of historical dialect recordings
dc:identifier
oai:tekstlab.uio.no:lia-norsk
dc:description
LIA Norwegian is a speech corpus with old recordings (1939 – 1996) from four Norwegian universities: NTNU, UoB, UoO and UoT. The recordings are mainly made for dialect and onomastic research and the topics of the interviews and conversations are typically about old trades such as agriculture, fisheries, logging and life at the summer farm. Other topics are weaving, knitting, baking or dialects. The recordings are semi-formal or informal and often take place in an informant’s home.
The first version of the corpus have 3.5 million tokens and 1374 speakers from 222 places in Norway.
The corpus is morphologically tagged with a statistical speech tagger for Nynorsk.