N-grams from NBdigital

CLARINO NB – Språkbanken

Lisens: Creative_Commons-ZERO (CC-ZERO)

Oppdatert: 2016-02-24

This resource contains n-grams - unigrams, bigrams and trigrams - from all books and newspapers that have been digitized at the National Library of Norway up to September 2013. The n-grams have been extracted from a material consisting of approx. 220 000 books and 540 000 newspapers.

The n-grams are available in two formats, CSV and SQlite: CSV is probably the most interesting format for most developers, because it is very easy to import these files into standard applications. The SQLite files contain databases with indexes that are used in the service NB N-gram. Users who want to contribute to the development of NB N-gram can download the source code on GitHub and the SQLite databases from this page.

