NST N-gram – dansk nyhendetekst
Utvidet metadata
- resource Common Info
- resource Type: corpus
- identification Info
- resource Name: NST N-gram – Danish News Text
- resource Name: NST N-gram – dansk nyhendetekst
- description: This corpus contains n-grams derived from a 290 million word corpus of Danish news text from the papers Berlingske Tidende, Ekstrabladet og Politiken. The time period covered is 1995-1999. The corpus was originally developed by Nordic Language Technology (NST) 1997-2003. The n-grams were generated by Uni Research for the National Library of Norway. Sequences of one to six words have been generated (i.e., unigrams, bigrams, trigrams, 4-grams, 5-grams and 6-grams) and ordered both by frequency and alphabetically. For convenience, a collection of the 1000 most frequent n-grams of all types listed above is also made available as a separate download.
- description: Dette korpuset inneheld n-gram på dansk, henta frå eit korpus på 290 millionar ord med nyhendetekst på dansk frå avisene Berlingske Tidende, Ekstrabladet og Politiken. Avisene er frå tidsperioden 1995-1999. Korpuset vart opprinneleg utvikla av Nordisk Språkteknologi (NST) i perioden 1997-2003. N-gramma vart lage av Uni Research for Nasjonalbiblioteket. Sekvensar av eitt til seks ord er genererte (unigram, bigram, trigram, 4-gram, 5-gram og 6-gram), og deretter sorterte alfabetisk og etter frekvens. Det er òg laga ein forenkla versjon for nedlasting med dei 1000 mest frekvente n-gramma av alle typar nemnde ovanfor.
- url: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-28/
- P I D: hdl:21.11146/28
- identifier: sbr-28
- distribution Info
- licence Info
- user Category: Public
- distribution Access Medium: downloadable
- download Location: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-28/
- licence
- licence Family: Creative Commons (CC)
- licence Name: Creative_Commons-ZERO (CC-ZERO)
- licence Url: https://creativecommons.org/publicdomain/zero/1.0/
- licensor:
- actor Info
- actor Type: organization
- role: Licensor
- organization Info
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- distribution Rights Holder
- actor Info
- actor Type: organization
- role: Distribution Rights Holder
- organization Info
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- actor Info
- licence Info
- contact
- actor Info
- actor Type: organization
- role: Contact
- organization Info
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- actor Info
- metadata Info
- metadata Creation Date: 11.06.2012
- metadata Language Name: English
- metadata Language Id: en
- metadata Last Date Updated: 02.07.2021
- metadata Creator
- actor Info
- actor Type: person
- role: Metadata Creator
- person Info
- surname: Birkenes
- given Name: Magnus Breder
- affiliation:
- organization Info
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- actor Info
- actor Info
- actor Type: person
- role: Metadata Creator
- person Info
- surname: Lindstad
- given Name: Arne Martinus
- affiliation:
- organization Info
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- version: 1.0
- last Date Updated: 11.06.2012
- validated: false
- documentation Unstructured
- role: documentation
- document Unstructured: Documentation file
- creation Start Date: 02.01.2012
- creation End Date: 11.06.2012
- resource Creator
- actor Info
- actor Type: person
- role: Resource Creator
- person Info
- surname: Hofland
- given Name: Knut
- affiliation:
- organization Info
- organization Name: University of Bergen
- organization Name: Universitetet i Bergen
- organization Short Name: UiB
- organization Short Name: UiB
- actor Info
- corpus Info
- corpus Type: Ngram Corpus
- corpus Part Info
- media Type: textNgram
- corpus Text Ngram Info
- ngram Info
- base Item: word
- order: 6
- text Format Info
- mime Type: text/plain
- size Per Text Format
- size Info
- size: 290000000
- size Unit: words
- size Info
- character Encoding Info
- character Encoding: Windows
- ngram Info
- corpus Part General Info
- linguality Info
- linguality Type: monolingual
- language Info
- language Id: da
- language Name: Danish
- language Variety Info
- language Variety Type: other
- language Variety Name: news text
- modality Info
- modality Type: writtenLanguage
- modality Type Details: news text
- size Info
- size: 290000000
- size Unit: words
- time Coverage Info
- time Coverage: 1995-1999
- linguality Info
dc:type | corpus |
dc:title | NST N-gram – dansk nyhendetekst |
dc:identifier | oai:nb.no:sbr-28 |
dc:description | Dette korpuset inneheld n-gram på dansk, henta frå eit korpus på 290 millionar ord med nyhendetekst på dansk frå avisene Berlingske Tidende, Ekstrabladet og Politiken. Avisene er frå tidsperioden 1995-1999. Korpuset vart opprinneleg utvikla av Nordisk Språkteknologi (NST) i perioden 1997-2003. N-gramma vart lage av Uni Research for Nasjonalbiblioteket. Sekvensar av eitt til seks ord er genererte (unigram, bigram, trigram, 4-gram, 5-gram og 6-gram), og deretter sorterte alfabetisk og etter frekvens. Det er òg laga ein forenkla versjon for nedlasting med dei 1000 mest frekvente n-gramma av alle typar nemnde ovanfor. |
dc:publisher | |
dc:format | downloadable |
dc:date | 2012-01-02 |
dc:date | 2012-06-11 |
dc:rights | Public |
dc:rights | Creative Commons (CC) |
dc:rights | Creative_Commons-ZERO (CC-ZERO) |
dc:rights | https://creativecommons.org/publicdomain/zero/1.0/ |
dc:creator | Knut Hofland |
dc:lang | dansk |