Nordic Dialect Corpus – downloadable transcriptions
Nordic Dialect Corpus v. 4.0 is a corpus of Norwegian, Swedish, Danish, Faroese, Icelandic and Övdalian spoken language. It consists of spontaneous speech data from dialects of the North Germanic languages across all of the Nordic countries. The linguistic data in the corpus comes from a variety of sources, (see homepage – Data Collection), recorded in 1998 – 2015. The corpus contains more than 2.75 million words from conversations and interviews by dialect speakers.
The downloadable version of the corpus contains all transcriptions in the corpus, both in txt and html format. The Norwegian and Övdaliantranscriptions are available in to versions: one phonetic and one orthographic. The other transcriptions are orthographically transcribed.
Nordic Dialect Corpus v. 4.0 is a corpus of Norwegian, Swedish, Danish, Faroese, Icelandic and Övdalian spoken language. It consists of spontaneous speech data from dialects of the North Germanic languages across all of the Nordic countries. The linguistic data in the corpus comes from a variety of sources, (see homepage – Data Collection), recorded in 1998 – 2015. The corpus contains more than 2.75 million words from conversations and interviews by dialect speakers.
The downloadable version of the corpus contains all transcriptions in the corpus, both in txt and html format. The Norwegian and Övdaliantranscriptions are available in to versions: one phonetic and one orthographic. The other transcriptions are orthographically transcribed.
Extended metadata
resource Common Info:
resource Type: corpus
identification Info:
resource Name: Nordic Dialect Corpus – downloadable transcriptions
description: Nordic Dialect Corpus v. 4.0 is a corpus of Norwegian, Swedish, Danish, Faroese, Icelandic and Övdalian spoken language. It consists of spontaneous speech data from dialects of the North Germanic languages across all of the Nordic countries. The linguistic data in the corpus comes from a variety of sources, (see homepage – Data Collection), recorded in 1998 – 2015. The corpus contains more than 2.75 million words from conversations and interviews by dialect speakers.
The downloadable version of the corpus contains all transcriptions in the corpus, both in txt and html format. The Norwegian and Övdaliantranscriptions are available in to versions: one phonetic and one orthographic. The other transcriptions are orthographically transcribed.
resource Short Name: NDC – downloadable transcriptions
non Standard Conditions Of Use: The corpus has audio and video recordings classified as personal data. In agreement with NSD, the Data Protection Official in Norway, the video and audio files are accessible only through Glossa, a search and post-processing tool developed by the Text Laboratory.
Please note that every individual researcher is responsible for treating the participants in the corpus with respect and sincerity. Furthermore, the participants must be kept anonymous in every published paper or other output.
licensor:
actor Info:
actor Type: organization
organization Info:
organization Name: University of Oslo
organization Name: Universitetet i Oslo
organization Short Name: UiO
organization Short Name: UoO
department Name: Department of Linguistics and Scandinavian Studies
department Name: Institutt for lingvistiske og nordiske studier (ILN)
validation Mode Details: The transcriptions are proof read against the audio files. The national projects NorDiaSyn, DanDiaSyn and SweDiaSyn have proof read own transcriptions, see homepage – Transcription
project Name: For the funding of the national projects in Norway, Sweden, Denmark, Iceland and Faroese islands, see under National Projects: http://www.tekstlab.uio.no/nota/scandiasyn/dialect_data_collection.html
document Unstructured: All languages are ortographical transcribed, see http://www.tekstlab.uio.no/nota/scandiasyn/transcription.html
annotation Tool:
target Resource Name U R I: Transcriber (http://trans.sourceforge.net/en/presentation.php )
ELAN (https://tla.mpi.nl/tools/tla-tools/elan/)
annotation Tool:
target Resource Name U R I: For Norwegian and Övdalian: https://www.hf.uio.no/iln/english/about/organization/text-laboratory/services/oslo-transliterator/index.html
classification Info:
genre Info:
genre Type: speechGenre
genre: informal
unstandardised Genre: conversations
classification Info:
genre Info:
genre Type: speechGenre
genre: semi formal
unstandardised Genre: interviews
time Coverage Info:
time Coverage: 1998 – 2015
geographic Coverage Info:
geographic Coverage: Norway, Sweden, Denmark, the Faroe Islands, Iceland and Älvdalen from 183 places
recording Info:
recording Environment: office
recording Environment: closedPublicPlace
recording Environment: conferenceRoom
recording Environment: lectureRoom
recording Environment: other
dc:type
corpus
dc:title
Nordic Dialect Corpus – downloadable transcriptions
Nordic Dialect Corpus v. 4.0 is a corpus of Norwegian, Swedish, Danish, Faroese, Icelandic and Övdalian spoken language. It consists of spontaneous speech data from dialects of the North Germanic languages across all of the Nordic countries. The linguistic data in the corpus comes from a variety of sources, (see homepage – Data Collection), recorded in 1998 – 2015. The corpus contains more than 2.75 million words from conversations and interviews by dialect speakers.
The downloadable version of the corpus contains all transcriptions in the corpus, both in txt and html format. The Norwegian and Övdaliantranscriptions are available in to versions: one phonetic and one orthographic. The other transcriptions are orthographically transcribed.
dc:publisher
dc:format
downloadable
dc:date
2005-01-01
dc:date
2019-09-31
dc:rights
Public
dc:rights
Creative Commons (CC)
dc:rights
Creative_Commons-BY-NC-SA (CC-BY-NC-SA)
dc:rights
http://creativecommons.org/licenses/by-nc-sa/4.0/
dc:lang
Norwegian Bokmål (the orthographic transcriptions)