The NORINT Corpus consists of speech from 51 and written texts from 116 adult learners of Norwegian as second language, all of whom were taking advanced Norwegian courses (≈the CEFR level B2) at the University of Oslo during the summers of 2014 and 2015.
The NORINT Corpus is divided into three sub-parts:
– NORINT Speech: The speech part of the corpus consists of interviews and conversations, 111,000 words all together. In the interviews, a teacher asks L2 learners general questions about their background, studies, work, and future plans. In addition, the same L2 learners converse in pairs about optional themes such as culture, leisure, travel, or life in Norway. There are both audio and video recordings of the interviews and conversations.
The recordings are transcribed orthographically with the transcription tool Elan.
– NORINT Recited: 57 L2 learners, 51 of whom contributed to the NORINT Speech sub-part, recite a short story, as well as 60 non-contextualized sentences. This part of the corpus has been audio-recorded.
– NORINT Text: The text part of the corpus consists of 53,247 words from 116 exam papers written by adult L2 learners taking their Norwegian exams. The informants are partially the same as in NORINT Speech and NORINT Recited but the identification of participants is not possible in the corpus because of privacy protection.
The texts are available in three formats: one original hand written version in pdf format, one written digital copy of the original version and one version where all the orthographic errors are corrected. The original text version and the corrected version are linked together.
The corpus is searchable in the search interface Glossa, and the transcriptions are linked to audio and video files.
The NORINT Corpus consists of speech from 51 and written texts from 116 adult learners of Norwegian as second language, all of whom were taking advanced Norwegian courses (≈the CEFR level B2) at the University of Oslo during the summers of 2014 and 2015.
The NORINT Corpus is divided into three sub-parts:
– NORINT Speech: The speech part of the corpus consists of interviews and conversations, 111,000 words all together. In the interviews, a teacher asks L2 learners general questions about their background, studies, work, and future plans. In addition, the same L2 learners converse in pairs about optional themes such as culture, leisure, travel, or life in Norway. There are both audio and video recordings of the interviews and conversations.
The recordings are transcribed orthographically with the transcription tool Elan.
– NORINT Recited: 57 L2 learners, 51 of whom contributed to the NORINT Speech sub-part, recite a short story, as well as 60 non-contextualized sentences. This part of the corpus has been audio-recorded.
– NORINT Text: The text part of the corpus consists of 53,247 words from 116 exam papers written by adult L2 learners taking their Norwegian exams. The informants are partially the same as in NORINT Speech and NORINT Recited but the identification of participants is not possible in the corpus because of privacy protection.
The texts are available in three formats: one original hand written version in pdf format, one written digital copy of the original version and one version where all the orthographic errors are corrected. The original text version and the corrected version are linked together.
The corpus is searchable in the search interface Glossa, and the transcriptions are linked to audio and video files.
Extended metadata
resource Common Info:
resource Type: corpus
identification Info:
resource Name: NORINT-korpuset
resource Name: The NORINT Corpus
description: The NORINT Corpus consists of speech from 51 and written texts from 116 adult learners of Norwegian as second language, all of whom were taking advanced Norwegian courses (≈the CEFR level B2) at the University of Oslo during the summers of 2014 and 2015.
The NORINT Corpus is divided into three sub-parts:
– NORINT Speech: The speech part of the corpus consists of interviews and conversations, 111,000 words all together. In the interviews, a teacher asks L2 learners general questions about their background, studies, work, and future plans. In addition, the same L2 learners converse in pairs about optional themes such as culture, leisure, travel, or life in Norway. There are both audio and video recordings of the interviews and conversations.
The recordings are transcribed orthographically with the transcription tool Elan.
– NORINT Recited: 57 L2 learners, 51 of whom contributed to the NORINT Speech sub-part, recite a short story, as well as 60 non-contextualized sentences. This part of the corpus has been audio-recorded.
– NORINT Text: The text part of the corpus consists of 53,247 words from 116 exam papers written by adult L2 learners taking their Norwegian exams. The informants are partially the same as in NORINT Speech and NORINT Recited but the identification of participants is not possible in the corpus because of privacy protection.
The texts are available in three formats: one original hand written version in pdf format, one written digital copy of the original version and one version where all the orthographic errors are corrected. The original text version and the corrected version are linked together.
The corpus is searchable in the search interface Glossa, and the transcriptions are linked to audio and video files.
description: NORINT-korpuset inneholder muntlig materiale fra 51 og skriftlig materiale fra 116 voksne internasjonale studenter som gikk på norskkurs på høyere nivå (≈CEFR-nivå B2) ved Universitetet i Oslo sommeren 2014 og 2015.
NORINT-korpuset består av tre deler:
– NORINT tale: Taledelen av korpuset består av intervjuer og samtaler, i alt 111 000 ord. Studentene ble intervjuet om bakgrunn, studier, arbeid og fremtidsplaner. I tillegg er det gjort video- og lydopptak der informantene samtaler to og to om emner som kultur, fritid, reiser eller livet i Norge. Det er 30 – 40 minutters opptak av hver student.
Opptakene er transkribert ortografisk med transkripsjonsprogrammet Elan.
– NORINT opplest: 57 informanter, 51 av dem de samme som bidro til NORINT tale, leser opp 60 utvalgte setninger og en liten historie. Det finnes bare lydopptak av opplesningene.
– NORINT tekst: Tekstdelen av korpuset består av 53 247 ord fra 116 eksamensoppgaver. Informantene er delvis de samme som i den muntlige delen av materialet. Av hensyn til personvern er det imidlertid ikke synlige koplinger i korpuset.
Tekstene i NORINT tekst foreligger i tre ulike formater: en håndskrevet originalversjon i pdf-format, en innskrevet nøyaktig kopi av originalversjonen og en versjon der alle ortografiske feil er rettet. Tekstversjonene og de korrigerte versjonene er lenket sammen.
Korpuset er søkbart i søkeverktøyet Glossa der transkripsjonene dessuten er koplet til lyd- og videofiler.
non Standard Conditions Of Use: The corpus has audio and video recordings classified as personal data. In agreement with NSD, the Data Protection Official in Norway, the corpus is accessible only through Glossa, a search and post-processing tool developed by the Text Laboratory.
The video and audio excerpts given by the search interface can not be shown in public unless you have an agreement with the Text Laboratory.
Please note that every individual researcher is responsible for treating the participants in the corpus with respect and sincerity. Furthermore, the participants must be kept anonymous in every published paper or other output.
licensor:
actor Info:
actor Type: organization
organization Info:
organization Name: University of Oslo
organization Name: Universitetet i Oslo
organization Short Name: UiO
organization Short Name: UoO
department Name: Department of Linguistics and Scandinavian Studies
department Name: Institutt for lingvistiske og nordiske studier (ILN)
funder: Department of Linguistic and Scandinavian Studies, University of Oslo
corpus Info:
corpus Type: Written Corpus
corpus Type: Multimodal Corpus
corpus Part Info:
media Type: text
corpus Text Info:
text Format Info:
mime Type: txt
character Encoding Info:
character Encoding: utf-8
corpus Part Info:
media Type: audio
corpus Audio Info:
audio Size Info:
size Info:
size: 57 participants x 3 audio files each for NORINT opplest (Recited)
size Unit: files
setting Info:
naturality: readSpeech
conversational Type: monologue
scenario Type: other
audience: no
interactivity: nonInteractive
audio Format Info:
mime Type: mp3 and wav
corpus Part Info:
media Type: video
corpus Video Info:
video Content Info:
type Of Video Content: Grown up foreign students learning Norwegian as their second language
setting Info:
naturality: spontaneous
conversational Type: dialogue
interactivity: overlapping
interaction: Each informant participates in one conversation with another informant and an interview with a teacher.
video Format Info:
mime Type: mp4
corpus Part General Info:
source Work Info:
work Description: The NORINT Corpus is divided into three sub-parts:
– NORINT Speech: The speech part of the corpus consists of interviews and conversations, 111,000 words all together. In the interviews, a teacher asks L2 learners general questions about their background, studies, work, and future plans. In addition, the same L2 learners converse in pairs about optional themes such as culture, leisure, travel, or life in Norway. There are both audio and video recordings of the interviews and conversations.
The recordings are transcribed orthographically with the transcription tool Elan.
– NORINT Recited: 57 L2 learners, 47 of whom contributed to the NORINT Speech sub-part, recite a short story, as well as 60 non-contextualized sentences. This part of the corpus has been audio-recorded.
– NORINT Text: The text part of the corpus consists of 53,247 words from 116 exam papers written by adult L2 learners taking their Norwegian exams. The informants are partially the same as in NORINT Speech and NORINT Recited but the identification of participants is not possible in the corpus because of privacy protection.
The texts are available in three formats: one original hand written version in pdf format, one written digital copy of the original version and one version where all the orthographic errors are corrected. The original text version and the corrected version are linked together.
person Source Set Info:
number Of Persons: 57
age Of Persons: adult
sex Of Persons: mixed
origin Of Persons: nonNative
dialect Accent Of Persons: Foreign students learning Norwegian.
unstandardised Genre: Exam papers written by students
The texts are available in three different versions: one scanned original in pdf format and two transcribed versions in txt format: one original transcription with errors and one version where the errors are corrected.
All versions are linked and it is possible to search in both transcribed versions.
genre Info:
genre Type: speechGenre
genre: informal
genre Info:
genre Type: speechGenre
genre: recited
time Coverage Info:
time Coverage: 2014
dc:type
corpus
dc:title
The NORINT Corpus
dc:identifier
oai:tekstlab.uio.no:norint
dc:description
The NORINT Corpus consists of speech from 51 and written texts from 116 adult learners of Norwegian as second language, all of whom were taking advanced Norwegian courses (≈the CEFR level B2) at the University of Oslo during the summers of 2014 and 2015.
The NORINT Corpus is divided into three sub-parts:
– NORINT Speech: The speech part of the corpus consists of interviews and conversations, 111,000 words all together. In the interviews, a teacher asks L2 learners general questions about their background, studies, work, and future plans. In addition, the same L2 learners converse in pairs about optional themes such as culture, leisure, travel, or life in Norway. There are both audio and video recordings of the interviews and conversations.
The recordings are transcribed orthographically with the transcription tool Elan.
– NORINT Recited: 57 L2 learners, 51 of whom contributed to the NORINT Speech sub-part, recite a short story, as well as 60 non-contextualized sentences. This part of the corpus has been audio-recorded.
– NORINT Text: The text part of the corpus consists of 53,247 words from 116 exam papers written by adult L2 learners taking their Norwegian exams. The informants are partially the same as in NORINT Speech and NORINT Recited but the identification of participants is not possible in the corpus because of privacy protection.
The texts are available in three formats: one original hand written version in pdf format, one written digital copy of the original version and one version where all the orthographic errors are corrected. The original text version and the corrected version are linked together.
The corpus is searchable in the search interface Glossa, and the transcriptions are linked to audio and video files.