COLT – The Bergen Corpus of London Teenage Language (with audio recordings)

COLT is a corpus of London Teenage Language with audio recordings.
It is now distributed via the search engine Corpuscle. Corpuscle allows you to pass queries to the corpus, and you may ask for concordances, collocations and distribution.

The corpus results from the project COLT. The aim of the project was to create a corpus of British English spontaneous teenage talk and make it available for research, first on the internet, next as an orthographically and prosodically transcribed CD-ROM version, and finally as a CD-ROM version with both text and sound. The recordings were made by 31 volunteering 13-17 year old boys and girls from five socially different school boroughs, so-called ‘recruits’ equipped with a Sony Walkman, a lapel microphone and a log book.

The entire material of roughly half a million words was orthographically transcribed by trained transcribers employed by the Longman Group for transcribing The British National Corpus (BNC). A copy of this version of COLT was incorporated in the BNC. At the Bergen end, the orthographically transcribed material was subsequently submitted to careful editing, which involved correcting misinterpreted talk, reducing the number of passages and adding untranscribed talk. The edited version was then tagged for word classes in the same way as the BNC by a research team at Lancaster university.

Utvidet metadata

Last ned metadata (CMDI XML)

Last ned metadata (CMDI XML)

dc:type	corpus
dc:title	COLT – The Bergen Corpus of London Teenage Language (with audio recordings)
dc:identifier	oai:clarino.uib.no:colt
dc:description	COLT is a corpus of London Teenage Language with audio recordings. It is now distributed via the search engine Corpuscle. Corpuscle allows you to pass queries to the corpus, and you may ask for concordances, collocations and distribution. The corpus results from the project COLT. The aim of the project was to create a corpus of British English spontaneous teenage talk and make it available for research, first on the internet, next as an orthographically and prosodically transcribed CD-ROM version, and finally as a CD-ROM version with both text and sound. The recordings were made by 31 volunteering 13-17 year old boys and girls from five socially different school boroughs, so-called ‘recruits’ equipped with a Sony Walkman, a lapel microphone and a log book. The entire material of roughly half a million words was orthographically transcribed by trained transcribers employed by the Longman Group for transcribing The British National Corpus (BNC). A copy of this version of COLT was incorporated in the BNC. At the Bergen end, the orthographically transcribed material was subsequently submitted to careful editing, which involved correcting misinterpreted talk, reducing the number of <unclear> passages and adding untranscribed talk. The edited version was then tagged for word classes in the same way as the BNC by a research team at Lancaster university.
dc:publisher
dc:format
dc:date
dc:date
dc:rights	Academic
dc:rights	CLARIN
dc:rights	CLARIN_ACA-NC-LOC-PRIV-ND-*
dc:rights	https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&PRIV=1&NORED=1&ND=1
dc:lang	engelsk

COLT – The Bergen Corpus of London Teenage Language (with audio recordings)

Utvidet metadata

Dublin Core (DC)

Last ned metadata (CMDI XML)