COLA (Corpus Oral de Lenguaje Adolescente Resource) is a corpus of recorded, spontaneous speech among teenagers from different schools and youth clubs in Madrid, Buenos Aires and Santiago de Chile. It is created for the purpose of studying teenage language in Spanish.
The sound files are coupled with orthographic transcriptions (text files) that are anonymized, making the corpus searchable as text through a web search interface where you can read the text and listen to the corresponding recording.
The full COLA corpus has three subparts:
1) COLAm: teenage language from Madrid
2) COLAba: teenage language from Buenos Aires
3) COLAs: teenage language from Santiago de Chile
The present metadata describe the part of COLA which is searchable through the corpus management and analysis system Corpuscle: http://clarino.uib.no/corpuscle.
As of August 2015, the Madrid subpart of the corpus is available for search in Corpuscle.
For enquires about access to other parts of COLA, please contact Annette Myre Jørgensen (see contact information details in metadata).
About the making of the corpus: The corpus results from the COLA project, led by Annette Myre Jørgensen at University of Bergen. The transcription work has been coordinated and led by Esperanza Eguía Padilla.
The technical development of the corpus was mainly done by Uni Research Computing, especially by Knut Hofland and Øystein Reigem.
The third subpart COLAs was compiled by Eli Marie Drange in the same project.
Formally, COLA belongs to the University of Bergen/Dept. of Foreign Languages.
In agreement with the head of department, the executive copyright holders (on behalf of University of Bergen) are: Annette Myre Jørgensen and Eli Marie Drange.
To access the corpus, a (short) research plan needs to be approved by Annette Myre Jørgensen.
COLA (Corpus Oral de Lenguaje Adolescente Resource) is a corpus of recorded, spontaneous speech among teenagers from different schools and youth clubs in Madrid, Buenos Aires and Santiago de Chile. It is created for the purpose of studying teenage language in Spanish.
The sound files are coupled with orthographic transcriptions (text files) that are anonymized, making the corpus searchable as text through a web search interface where you can read the text and listen to the corresponding recording.
The full COLA corpus has three subparts:
1) COLAm: teenage language from Madrid
2) COLAba: teenage language from Buenos Aires
3) COLAs: teenage language from Santiago de Chile
The present metadata describe the part of COLA which is searchable through the corpus management and analysis system Corpuscle: http://clarino.uib.no/corpuscle.
As of August 2015, the Madrid subpart of the corpus is available for search in Corpuscle.
For enquires about access to other parts of COLA, please contact Annette Myre Jørgensen (see contact information details in metadata).
About the making of the corpus: The corpus results from the COLA project, led by Annette Myre Jørgensen at University of Bergen. The transcription work has been coordinated and led by Esperanza Eguía Padilla.
The technical development of the corpus was mainly done by Uni Research Computing, especially by Knut Hofland and Øystein Reigem.
The third subpart COLAs was compiled by Eli Marie Drange in the same project.
Formally, COLA belongs to the University of Bergen/Dept. of Foreign Languages.
In agreement with the head of department, the executive copyright holders (on behalf of University of Bergen) are: Annette Myre Jørgensen and Eli Marie Drange.
To access the corpus, a (short) research plan needs to be approved by Annette Myre Jørgensen.
Extended metadata
resource Common Info
resource Type: corpus
identification Info
resource Name: COLA – Corpus Oral de Lenguaje Adolescente
description: COLA (Corpus Oral de Lenguaje Adolescente Resource) is a corpus of recorded, spontaneous speech among teenagers from different schools and youth clubs in Madrid, Buenos Aires and Santiago de Chile. It is created for the purpose of studying teenage language in Spanish.
The sound files are coupled with orthographic transcriptions (text files) that are anonymized, making the corpus searchable as text through a web search interface where you can read the text and listen to the corresponding recording.
The full COLA corpus has three subparts:
1) COLAm: teenage language from Madrid
2) COLAba: teenage language from Buenos Aires
3) COLAs: teenage language from Santiago de Chile
The present metadata describe the part of COLA which is searchable through the corpus management and analysis system Corpuscle: http://clarino.uib.no/corpuscle.
As of August 2015, the Madrid subpart of the corpus is available for search in Corpuscle.
For enquires about access to other parts of COLA, please contact Annette Myre Jørgensen (see contact information details in metadata).
About the making of the corpus: The corpus results from the COLA project, led by Annette Myre Jørgensen at University of Bergen. The transcription work has been coordinated and led by Esperanza Eguía Padilla.
The technical development of the corpus was mainly done by Uni Research Computing, especially by Knut Hofland and Øystein Reigem.
The third subpart COLAs was compiled by Eli Marie Drange in the same project.
Formally, COLA belongs to the University of Bergen/Dept. of Foreign Languages.
In agreement with the head of department, the executive copyright holders (on behalf of University of Bergen) are: Annette Myre Jørgensen and Eli Marie Drange.
To access the corpus, a (short) research plan needs to be approved by Annette Myre Jørgensen.
attribution Text: The COLA corpus is distributed by Corpuscle (http://hdl.handle.net/11495/D98E-D689-6A14-5) and was created in the COLA project at the University of Bergen. Jørgensen, Annette Myre. 2008. “COLA: Un corpus Oral de Lenguaje Adolescente”, Anejos a Oralia 3.1.
non Standard Conditions Of Use: Time limited access: The End-User’s access to the Resource being only valid for a
specified task/project, the research plan must specify a time span for the project. The End-User’s access to the Resource will thus be limited to the End-User’s expected needs.
project Name: COLA (Corpus Oral de Lenguaje Adolescente)
project Short Name: COLA
funding Type: nationalFunds
funder: University of Bergen, Faculty of Arts
funder: Meltzer fund
funder: Research Council of Norway
funding Country: Norway
project Start Date: 2002
corpus Info
corpus Type: Multimodal Corpus
corpus Part Info
media Type: audio
corpus Audio Info
audio Size Info
size Info
size: 500000
size Unit: words
duration Of Audio Info
size: 50
duration Unit: hours
audio Content Info
textual Description: The method used for recording the data follows the same pattern as the COLT Corpus of English adolescents and the UNO Corpus of Norwegian adolescents, which in turn is patterned on the Longman model used for collecting the British National Corpus (BNC). The recruits were selected from schools in areas with different social status in order to create a balanced corpus with regards to gender, type of school and social status. The recruits are also between 13-18 years old. Each recruit was then equipped with a Minidisc recorder and a microphone, and asked to record his or her conversations with friends and at school for a few days. Some of the conversations are recorded at school, in breaks or during teamwork, and some of the conversations are recorded at home or at places where adolescents use to meet, as parks and so on. The recruits filled in a questionnaire with some personal information as place of birth, language spoken at home, etc, and they were also requested to write down some information about the other participants in their conversations.
The madrid consists of 78 recordings (individual conversations), which roughly corresponds to 50 hours of recording. Based on the transcriptions, the material consists of ca 750000 tokens, but when considering that some 'tokens' form multiword units, there are ca 500000 lexemes.
setting Info
naturality: spontaneous
conversational Type: multilogue
corpus Text Info
text Format Info
mime Type: text/plain
character Encoding Info
character Encoding: UTF-8
corpus Part General Info
linguality Info
linguality Type: monolingual
language Info
language Id: es
language Name: Spanish
language Variety Info
language Variety Type: jargon
language Variety Name: teenage language
language Variety Info
language Variety Type: dialect
language Variety Name: Corpus part COLAm: teenage language (spoken) in Madrid
size Per Language Variety
size Info
size: 500000
size Unit: words
modality Info
modality Type: writtenLanguage
modality Type Details: Transciptions of the recorded speech
modality Info
modality Type: spokenLanguage
modality Type Details: Spontaneous speech among teenagers
annotation Mode Details: COLA has been transcibed to be made searchable as text. Using the program Transcriber, the recordings were orthographically transkribed.
Apart from the ortographic words, there is specific annotations for imitation and citing, incomplete words (%) and unclear words (XXX), rising vs. falling intonation for questions.
The user is meant to listen to the sound file while reading the transciption; thus there is no annotation for non-linguistic sounds such as coughing, dog's bark. I Corpuscle the user may click on the sound file to listen while reading the transcription.
annotation Tool
target Resource Name U R I: Transcriber
annotator:
actor Info
actor Type: person
person Info
surname: Padilla
given Name: Esperanza Eguía
sex: female
classification Info
genre Info
genre Type: audioGenre
genre: informal
unstandardised Genre: teenage language
time Coverage Info
time Coverage: Recordings between 2002 – 2004 and in 2007 (Madrid corpus subpart)
dc:type
corpus
dc:title
COLA – Corpus Oral de Lenguaje Adolescente
dc:identifier
oai:clarino.uib.no:cola
dc:description
COLA (Corpus Oral de Lenguaje Adolescente Resource) is a corpus of recorded, spontaneous speech among teenagers from different schools and youth clubs in Madrid, Buenos Aires and Santiago de Chile. It is created for the purpose of studying teenage language in Spanish.
The sound files are coupled with orthographic transcriptions (text files) that are anonymized, making the corpus searchable as text through a web search interface where you can read the text and listen to the corresponding recording.
The full COLA corpus has three subparts:
1) COLAm: teenage language from Madrid
2) COLAba: teenage language from Buenos Aires
3) COLAs: teenage language from Santiago de Chile
The present metadata describe the part of COLA which is searchable through the corpus management and analysis system Corpuscle: http://clarino.uib.no/corpuscle.
As of August 2015, the Madrid subpart of the corpus is available for search in Corpuscle.
For enquires about access to other parts of COLA, please contact Annette Myre Jørgensen (see contact information details in metadata).
About the making of the corpus: The corpus results from the COLA project, led by Annette Myre Jørgensen at University of Bergen. The transcription work has been coordinated and led by Esperanza Eguía Padilla.
The technical development of the corpus was mainly done by Uni Research Computing, especially by Knut Hofland and Øystein Reigem.
The third subpart COLAs was compiled by Eli Marie Drange in the same project.
Formally, COLA belongs to the University of Bergen/Dept. of Foreign Languages.
In agreement with the head of department, the executive copyright holders (on behalf of University of Bergen) are: Annette Myre Jørgensen and Eli Marie Drange.
To access the corpus, a (short) research plan needs to be approved by Annette Myre Jørgensen.