NoWaC v 1.0 (Norwegian Web as Corpus)
Utvidet metadata
- resource Common Info:
- resource Type: corpus
- identification Info:
- resource Name: NoWaC v 1.0 (Norwegian Web as Corpus)
- description: Web-based corpus of Bokmål Norwegian containing about 700 million tokens. The corpus has been built by crawling, downloading and processing web documents in the .no top-level internet domain between November 2009 and January 2010. NoWaC has been built with permission from the Norwegian Ministry of Culture (Kulturdepartementet). There are no information about author, publisher, genre etc in the corpus. NoWaC can be downloaded (scrambled version) or accessed through a search interface (Glossa).
- resource Short Name: NoWaC
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/nowac/index.html
- P I D: http://hdl.handle.net/11538/0000-0005-E7C0-D
- distribution Info:
- licence Info:
- user Category: Public
- distribution Access Medium: downloadable
- download Location: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/nowac/index.html
- licence:
- licence Family: Creative Commons (CC)
- licence Name: Creative_Commons-BY-NC-SA (CC-BY-NC-SA)
- licence Url: http://creativecommons.org/licenses/by-nc-sa/2.0/
- conditions Of Use: BY
- conditions Of Use: NC
- conditions Of Use: SA
- licensor:
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info:
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- distribution Rights Holder
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info:
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- actor Info:
- actor Type: person
- person Info:
- surname: Guevara
- given Name: Emiliano
- affiliation:
- organization Info:
- organization Name: The Text Laboratory
- department Name: Department of Linguistics and Scandinavian Studies
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- actor Info:
- actor Type: person
- person Info:
- surname: Hagen
- given Name: Kristin
- sex: female
- actor Info:
- actor Type: person
- person Info:
- surname: Guevara
- given Name: Emiliano
- affiliation:
- organization Info:
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- corpus Info:
- corpus Type: Written Corpus
- corpus Part Info:
- media Type: text
- corpus Text Info:
- text Format Info:
- mime Type: txt
- character Encoding Info:
- character Encoding: utf-8
- corpus Part General Info:
- linguality Info:
- linguality Type: monolingual
- language Info:
- language Id: Nb
- language Name: Norwegian Bokmål
- size Info:
- size: 7000 000
- size Unit: tokens
- annotation Info:
- annotation Type: morphosyntacticAnnotation-posTagging
- annotation Type: lemmatization
- segmentation Level: word
- tagset: The Oslo Bergen-tagger tagset: http://tekstlab.uio.no/obt-ny/english/index.html
- tagset Language Id: NB
- tagset Language Name: Norwegian bokmål
- annotation Mode: automatic
- annotation Manual Unstructured:
- role: annotationManual
- document Unstructured: http://www.tekstlab.uio.no/obt-ny/english/index.html
- annotation Tool:
- target Resource Name U R I: The Oslo-Bergen Tagger: http://tekstlab.uio.no/obt-ny/english/index.html
- classification Info:
- conformance To Classification Scheme: other
- genre Info:
- genre Type: textGenre
- genre: unstandardised
- unstandardised Genre: scrambled web corpus/searchable web corpus
- time Coverage Info:
- time Coverage: November 2009 – January 2010
dc:type | corpus |
dc:title | NoWaC v 1.0 (Norwegian Web as Corpus) |
dc:identifier | oai:tekstlab.uio.no:nowac |
dc:description | Web-based corpus of Bokmål Norwegian containing about 700 million tokens. The corpus has been built by crawling, downloading and processing web documents in the .no top-level internet domain between November 2009 and January 2010. NoWaC has been built with permission from the Norwegian Ministry of Culture (Kulturdepartementet). There are no information about author, publisher, genre etc in the corpus. NoWaC can be downloaded (scrambled version) or accessed through a search interface (Glossa). |
dc:publisher | |
dc:format | downloadable |
dc:date | 2009-08-01 |
dc:date | 2010-12-31 |
dc:rights | Public |
dc:rights | Creative Commons (CC) |
dc:rights | Creative_Commons-BY-NC-SA (CC-BY-NC-SA) |
dc:rights | http://creativecommons.org/licenses/by-nc-sa/2.0/ |
dc:creator | Emiliano Guevara |
dc:lang | bokmål |