Speech  14.06.2021

NST Danish ASR Database (16 kHz) – reorganized

This database was created by Nordic Language Technology for the development of automatic speech recognition and dictation in Danish. In this updated version, the organization of the data have been …

  • Language: Danish
  • Origin: Language Bank
  • Licence: Creative_Commons-ZERO (CC-ZERO)
Speech  21.05.2021

Norwegian Parliamentary Speech Corpus

This is a beta release (version 0.2) of The Norwegian Parliamentary Speech Corpus (NPSC). The corpus is developed by the Norwegian Language Bank at the National Library of Norway. The project was …

  • Language: Norwegian
  • Origin: Language Bank
  • Licence: Creative_Commons-ZERO (CC-ZERO)
Speech, Text  04.05.2021

TAUS – The spoken language investigation in Oslo

The material from TAUS (The spoken language investigation in Oslo) is based on informal interviews with people from Oslo. The interviews were made in 1971-73. The informants are mainly from two …

  • Language: Norwegian, Norwegian Bokmål
  • Origin: CLARINO Text Laboratory Centre
  • Licence: CLARIN_ACA-NC-LOC-PRIV-ND-*
Text  04.05.2021

TAUS – downloadable transcriptions

TAUS (The spoken language investigation in Oslo) v.3 is a speech corpus with 86 speakers and 387 551 tokens. The downloadable version of the corpus contains the transcriptions, approx. 387 500 tokens, …

  • Language: Norwegian, Norwegian Bokmål
  • Origin: CLARINO Text Laboratory Centre
  • Licence: Creative_Commons-BY-NC-SA (CC-BY-NC-SA)
Text  21.04.2021

The Abkhaz National Corpus

The Abkhaz National Corpus is a comprehensive and open, grammatically annotated text corpus. It makes the Abkhaz language accessible to scientific investigations from various perspectives …

  • Language: Abkhaz
  • Origin: CLARINO Bergen Centre
  • Licence: CLARIN_PUB-BY-NC-ND
Speech, Text, Video  16.04.2021

Norsk talespråkskorpus – Oslodelen

NoTa-Oslo is a speech corpus with interviews and conversations from 166 informants born and raised in Oslo and the Oslo area. The informants are carefully selected w.r.t. sociolinguistic variables and …

  • Language: Norwegian, Norwegian Bokmål
  • Origin: CLARINO Text Laboratory Centre
  • Licence: CLARIN_ACA-NC-LOC-PRIV-ND-*
Speech, Text, Video  16.04.2021

Nordic Dialect Corpus v. 4.0

Nordic Dialect Corpus v.4.0 is a corpus of Norwegian, Swedish, Danish, Faroese, Icelandic and Övdalian spoken language. It consists of spontaneous speech data from dialects of the North Germanic …

  • Language: Norwegian Bokmål (the orthographic transcriptions), Swedish (Övdalien included), Danish, Icelandic, Faroese
  • Origin: CLARINO Text Laboratory Centre
  • Licence: CLARIN_ACA-NC-LOC-PRIV-ND-*
Text  16.04.2021

Text material from Forskning.no (1998 – 2017)

Data set containing texts from the popular science website forskning.no from the period 1998 - 2017. The text material is constituted by articles published by Forskning.no belonging to the following …

  • Language: Norwegian, Norwegian Bokmål
  • Origin: CLARINO Bergen Centre
  • Licence: CLARIN_RES-DEP
Text  16.04.2021

Nordic Dialect Corpus – downloadable transcriptions

Nordic Dialect Corpus v. 4.0 is a corpus of Norwegian, Swedish, Danish, Faroese, Icelandic and Övdalian spoken language. It consists of spontaneous speech data from dialects of the North Germanic …

  • Language: Norwegian Bokmål (the orthographic transcriptions), Swedish (Övdalien included), Danish, Icelandic, Faroese
  • Origin: CLARINO Text Laboratory Centre
  • Licence: Creative_Commons-BY-NC-SA (CC-BY-NC-SA)
Text  16.04.2021

NoTa-Oslo – downloadable transcriptions

NoTa-Oslo is a speech corpus with interviews and conversations from 166 informants born and raised in Oslo and the Oslo area. The informants are carefully selected w.r.t. sociolinguistic variables and …

  • Language: Norwegian, Norwegian Bokmål
  • Origin: CLARINO Text Laboratory Centre
  • Licence: Creative_Commons-BY-NC-SA (CC-BY-NC-SA)