Skip to content

Translation Memories from Semantix AS

This corpus contains translation memories provided to the National Library of Norway by Semantix AS. The translations have been carried out on behalf of various public agencies and institutions.

The corpus is composed of texts of English or Norwegian Bokmål origin, with parallelized translations into the opposite language. There are some very few examples of translation into Norwegian Nynorsk, but for simplicity, these have been classified as Norwegian Bokmål.

All translations from English to Norwegian Bokmål are collected in one separate file, and vice versa. The files are in TMX 1.4 format (a variant of XML). Each translation unit (TU) is marked with the institution for which the translation has been carried out. A TU corresponds (more or less) to a meaningful linguistic unit, typically a sentence, a heading etc. A TU may also consist of a single word or several clauses.

The corpus contains a total of 1,325,013 TUs, distributed as follows:

– English > Norwegian Bokmål: 250,053 TUs
– Norwegian Bokmål > English: 1,074,960 TUs

The documentation file contains an overview of the agencies and institutions, and the number of TUs belonging to each institution.

This corpus contains translation memories provided to the National Library of Norway by Semantix AS. The translations have been carried out on behalf of various public agencies and institutions.

The corpus is composed of texts of English or Norwegian Bokmål origin, with parallelized translations into the opposite language. There are some very few examples of translation into Norwegian Nynorsk, but for simplicity, these have been classified as Norwegian Bokmål.

All translations from English to Norwegian Bokmål are collected in one separate file, and vice versa. The files are in TMX 1.4 format (a variant of XML). Each translation unit (TU) is marked with the institution for which the translation has been carried out. A TU corresponds (more or less) to a meaningful linguistic unit, typically a sentence, a heading etc. A TU may also consist of a single word or several clauses.

The corpus contains a total of 1,325,013 TUs, distributed as follows:

– English > Norwegian Bokmål: 250,053 TUs
– Norwegian Bokmål > English: 1,074,960 TUs

The documentation file contains an overview of the agencies and institutions, and the number of TUs belonging to each institution.

Extended metadata

Download resources

Download metadata