Nordic Dialect Corpus – downloadable transcriptions

Nordic Dialect Corpus v. 4.0 is a corpus of Norwegian, Swedish, Danish, Faroese, Icelandic and Övdalian spoken language. It consists of spontaneous speech data from dialects of the North Germanic languages across all of the Nordic countries. The linguistic data in the corpus comes from a variety of sources, (see homepage - Data Collection), recorded in 1998 - 2015. The corpus contains more than 2.75 million words from conversations and interviews by dialect speakers.

The downloadable version of the corpus contains all transcriptions in the corpus, both in txt and html format. The Norwegian and Övdaliantranscriptions are available in to versions: one phonetic and one orthographic. The other transcriptions are orthographically transcribed.

Nordic Dialect Corpus v. 4.0 is a corpus of Norwegian, Swedish, Danish, Faroese, Icelandic and Övdalian spoken language. It consists of spontaneous speech data from dialects of the North Germanic languages across all of the Nordic countries. The linguistic data in the corpus comes from a variety of sources, (see homepage - Data Collection), recorded in 1998 - 2015. The corpus contains more than 2.75 million words from conversations and interviews by dialect speakers.

The downloadable version of the corpus contains all transcriptions in the corpus, both in txt and html format. The Norwegian and Övdaliantranscriptions are available in to versions: one phonetic and one orthographic. The other transcriptions are orthographically transcribed.

Extended metadata

Download resources

Download metadata

Go to resource page