Skip to content

Public Domain Texts from NBdigital

This corpus consists of public domain texts from the National Library’s online collection. The corpus contains 26,344 books (and other written material) by 10,756 different authors (including, e.g., public institutions for publically available material).

The material is downloadable as compressed tar-files containing the texts in two formats: html and simple text without any markup. The character encoding is UTF-8 for both formats.

The quality of the texts varies depending on the quality of the OCR. In addition to texts in Norwegian (Bokmål and Nynorsk), the collection contains texts in several other languages.

This corpus consists of public domain texts from the National Library’s online collection. The corpus contains 26,344 books (and other written material) by 10,756 different authors (including, e.g., public institutions for publically available material).

The material is downloadable as compressed tar-files containing the texts in two formats: html and simple text without any markup. The character encoding is UTF-8 for both formats.

The quality of the texts varies depending on the quality of the OCR. In addition to texts in Norwegian (Bokmål and Nynorsk), the collection contains texts in several other languages.

Extended metadata

Download resources

Download metadata