NorGramBank Non-fiction text in Norwegian Nynorsk from Forskning.no.
The «NorGram Non-fiction text in Norwegian Nynorsk from Forskning.no» treebank is a syntactically annotated corpus based on data taken from the Norwegian popular science website Forskning.no. This treebank is part of INESS NorGramBank collection (see URL in metadata).
As of October 2015, the treebank comprises 21723 sentences, 371744 words and 582 documents.
The «NorGram Non-fiction text in Norwegian Nynorsk from Forskning.no» treebank is a syntactically annotated corpus based on data taken from the Norwegian popular science website Forskning.no. This treebank is part of INESS NorGramBank collection (see URL in metadata).
As of October 2015, the treebank comprises 21723 sentences, 371744 words and 582 documents.
Utvidet metadata
resource Common Info
resource Type: corpus
identification Info
resource Name: NorGramBank Non-fiction text in Norwegian Nynorsk from Forskning.no.
description: The "NorGram Non-fiction text in Norwegian Nynorsk from Forskning.no" treebank is a syntactically annotated corpus based on data taken from the Norwegian popular science website Forskning.no. This treebank is part of INESS NorGramBank collection (see URL in metadata).
As of October 2015, the treebank comprises 21723 sentences, 371744 words and 582 documents.
resource Short Name: Forskning.no in Norwegian Nynorsk
funder: The Research Council of Norway under the Infrastruktur program
funder: University of Bergen
funding Country: Norway
corpus Info
corpus Type: Treebank
corpus Part Info
media Type: text
corpus Text Info
character Encoding Info
character Encoding: UTF-8
corpus Part General Info
source Work Info
work Description: The text material is constituted by articles published by Forskning.no (CLARINO's agreement also includes the permission to use future articles to be published by Forskning.no) belonging to the following three categories:
1) Articles written by journalists employed at Forskning.no
2) Articles written by member institutions of Forskning.no (76 universities, colleges, research
centers, research departments in government agencies and more). These articles are written by staff journalists, information officers and other non-academic staff. Each article has been edited by Forskning.no.
3) Articles from the newsdesk NRK Viten, with whom Forskning.no cooperates. These articles are written by NRK journalists. A full list of partner/cooperation institutions may be presented on demand.
linguality Info
linguality Type: monolingual
language Info
language Id: no
language Name: Norwegian
language Info
language Id: nn
language Name: Norwegian Nynorsk
modality Info
modality Type: writtenLanguage
size Info
size: 21723
size Unit: sentences
size Info
size: 371744
size Unit: words
size Info
size: 582
size Unit: articles
annotation Info
annotation Type: syntacticAnnotation-treebanks
annotation Standoff: false
segmentation Level: sentence
annotation Format: XLE (Packed c- and f-structures in Prolog)
department Name: Department of Linguistic, Literary and Aesthetic Studies
actor Info
actor Type: person
person Info
surname: Lyse
given Name: Gunn Inger
sex: female
position: Researcher (Ph.D)
affiliation:
organization Info
organization Name: University of Bergen
organization Name: Universitetet i Bergen
organization Short Name: UiB
organization Short Name: UoB
department Name: Department of Linguistic, Literary and Aesthetic Studies
actor Info
actor Type: person
person Info
surname: Thunes
given Name: Martha
sex: female
position: Postdoc in INESS
affiliation:
organization Info
organization Name: University of Bergen
organization Name: Universitetet i Bergen
organization Short Name: UiB
organization Short Name: UoB
department Name: Department of Linguistic, Literary and Aesthetic Studies
actor Info
actor Type: person
person Info
surname: Haugereid
given Name: Petter
sex: male
position: Researcher (Ph.D)
affiliation:
organization Info
organization Name: University of Bergen
organization Name: Universitetet i Bergen
organization Short Name: UiB
organization Short Name: UoB
department Name: Department of Linguistic, Literary and Aesthetic Studies
actor Info
actor Type: person
person Info
surname: Fatnes
given Name: Ingeborg
sex: female
position: Scientific assistant in INESS (text preprocessing)
actor Info
actor Type: person
person Info
surname: Dale
given Name: Ingerid
sex: female
position: Scientific assistant in INESS (text preprocessing)
actor Info
actor Type: person
person Info
surname: Bergmann
given Name: Julie
sex: female
position: Scientific assistant in INESS (text preprocessing)
annotation Info
annotation Type: other
segmentation Level: word
annotation Mode: interactive
annotation Mode Details: Text Preprocessing:
When a corpus is parsed, there will always be words that are unknown to the morphological analyzer and/or the lexicon.
Thus, the documents must be preprocessed before syntactic parsing.
INESS has therefore developed an intelligent browser-based preprocessing interface which facilitates efficient text cleanup and the treatment of unknown word forms.
For more details, cf. Rosén et al (2012). 'An integrated web-based treebank annotation system'. http://clarino.uib.no/iness/page?page-id=Publications.
classification Info
genre Info
genre Type: textGenre
genre: newspaper and magazines
time Coverage Info
time Coverage: 1998-05-01 – 2012-10-20
creation Info
creation Mode: mixed
creation Mode Details: The annotation is created through parsebanking. Analyses produced by parsing with the Norwegian LFG grammar NorGram on the XLE platform were manually disambiguated with discriminants, and reparsed after grammar and lexicon updates.
creation Tool
target Resource Name U R I: XLE
creation Tool
target Resource Name U R I: NorGram (online demonstrator: http://clarino.uib.no/iness/xle-web)
creation Tool
target Resource Name U R I: LFG Parsebanker
dc:type
corpus
dc:title
NorGramBank Non-fiction text in Norwegian Nynorsk from Forskning.no.
dc:identifier
oai:clarino.uib.no:nno-fn
dc:description
The "NorGram Non-fiction text in Norwegian Nynorsk from Forskning.no" treebank is a syntactically annotated corpus based on data taken from the Norwegian popular science website Forskning.no. This treebank is part of INESS NorGramBank collection (see URL in metadata).
As of October 2015, the treebank comprises 21723 sentences, 371744 words and 582 documents.