On behalf of The National Library of Norway, Kaldera Språkteknologi AS is developing wordnets for both written standards of Norwegian.
- Version 1.0.0 of Norwegian Wordnet is now available for download: go to the Lexical resources page.
- An online version of the wordnets is available on Kaldera språkteknologi's web page.
The National Library is funding the development of separate wordnets for both written standards of Norwegian. Each will contain at least 50.000 so-called synsets, i.e. sets of one or more words with the same meaning. Words (or rather: the meaning of the words) in the same synset will have the same denotation, that is, core meaning. An example is “horse”, “nag”, and “steed”. The synsets are related to each other semantically by, e.g., hyponymy (a “horse” is an “animal”), meronymy (a “finger” is a part of a “hand”), place (an “oasis” is in a “desert”) and so on.
A wordnet is an important basic component in the development of language technology solutions, for instance automatic translation, information retrieval and grammar control, but also within language research, especially computational linguistics.
The work with the Norwegian wordnets is based on a Scandinavian collaboration. The main structure in the wordnets takes its starting point in the Danish wordnet DanNet, which in its turn is based on Den Danske Ordbog (The Danish Dictionary), and the project benefits from the fact that Danish and Norwegian are closely related languages, both historically and geographically. Even if there are differences, both in the morphology, syntax and sound systems (especially with regard to Norwegian Nynorsk and the dialects), the difference in meaning between the languages is often a question of different nuances of the words, or different usage. This type of variation is not reflected in the wordnet, which does not encode secondary meanings (connotations) and valency (boundaries in argument structure).
Nevertheless, many words from DanNet will be left out because they are mainly relevant for Danish usage, while many Norwegian words which are missing in DanNet, will be included (e.g., words which have to do with skiing). Extensive editing work will ensure that the wordnets reflect a Norwegian, and not a Danish, semantic structure.
An important challenge in the development of a wordnet is to keep the coding system consistent. For instance, “race walking” can be a hyponym of “sports”, but if there is a class called “athletics” which contains, for instance, “javelin”, “race walking” ought to be categorised under “athletics”. One of the aims of the collaboration with the DanNet project is that the consistency may become very high by working with the same material.
Wordnets exist for many languages, but a new feature of the Norwegian ones is that more parts of speech than usual will be included in the wordnet, first and foremost prepositions and adverbs. In addition, the system of relations will be more expanded in comparison to earlier projects.
The project is being carried out by Kaldera Språkteknologi AS.