Skip to content

COLA – Corpus Oral de Lenguaje Adolescente

COLA (Corpus Oral de Lenguaje Adolescente Resource) is a corpus of recorded, spontaneous speech among teenagers from different schools and youth clubs in Madrid, Buenos Aires and Santiago de Chile. It is created for the purpose of studying teenage language in Spanish.
The sound files are coupled with orthographic transcriptions (text files) that are anonymized, making the corpus searchable as text through a web search interface where you can read the text and listen to the corresponding recording.

The full COLA corpus has three subparts:
1) COLAm: teenage language from Madrid
2) COLAba: teenage language from Buenos Aires
3) COLAs: teenage language from Santiago de Chile

The present metadata describe the part of COLA which is searchable through the corpus management and analysis system Corpuscle: http://clarino.uib.no/corpuscle.
As of August 2015, the Madrid subpart of the corpus is available for search in Corpuscle.
For enquires about access to other parts of COLA, please contact Annette Myre Jørgensen (see contact information details in metadata).

About the making of the corpus: The corpus results from the COLA project, led by Annette Myre Jørgensen at University of Bergen. The transcription work has been coordinated and led by Esperanza Eguía Padilla.
The technical development of the corpus was mainly done by Uni Research Computing, especially by Knut Hofland and Øystein Reigem.
The third subpart COLAs was compiled by Eli Marie Drange in the same project.
Formally, COLA belongs to the University of Bergen/Dept. of Foreign Languages.
In agreement with the head of department, the executive copyright holders (on behalf of University of Bergen) are: Annette Myre Jørgensen and Eli Marie Drange.

To access the corpus, a (short) research plan needs to be approved by Annette Myre Jørgensen.

COLA (Corpus Oral de Lenguaje Adolescente Resource) is a corpus of recorded, spontaneous speech among teenagers from different schools and youth clubs in Madrid, Buenos Aires and Santiago de Chile. It is created for the purpose of studying teenage language in Spanish.
The sound files are coupled with orthographic transcriptions (text files) that are anonymized, making the corpus searchable as text through a web search interface where you can read the text and listen to the corresponding recording.

The full COLA corpus has three subparts:
1) COLAm: teenage language from Madrid
2) COLAba: teenage language from Buenos Aires
3) COLAs: teenage language from Santiago de Chile

The present metadata describe the part of COLA which is searchable through the corpus management and analysis system Corpuscle: http://clarino.uib.no/corpuscle.
As of August 2015, the Madrid subpart of the corpus is available for search in Corpuscle.
For enquires about access to other parts of COLA, please contact Annette Myre Jørgensen (see contact information details in metadata).

About the making of the corpus: The corpus results from the COLA project, led by Annette Myre Jørgensen at University of Bergen. The transcription work has been coordinated and led by Esperanza Eguía Padilla.
The technical development of the corpus was mainly done by Uni Research Computing, especially by Knut Hofland and Øystein Reigem.
The third subpart COLAs was compiled by Eli Marie Drange in the same project.
Formally, COLA belongs to the University of Bergen/Dept. of Foreign Languages.
In agreement with the head of department, the executive copyright holders (on behalf of University of Bergen) are: Annette Myre Jørgensen and Eli Marie Drange.

To access the corpus, a (short) research plan needs to be approved by Annette Myre Jørgensen.

Extended metadata

Download metadata