The digitizing process

The National Library of Norway has several production lines which handle different types of material. Each day several terabytes of data flow though these lines.

The material that is digitalised is subject to the following production processes:

  • Scanning/digititzation
  • Structure analysis
  • Post-processing

The National Library of Norway has several types of scanners and digitizing tools, including document scanners, leaf scanners and automatic leaf scanners. These are used to digitize paper materials such as books, newspapers, photographs etc. We also have equipment for digitizing films, videos and sound.

Structure analysis:
Once material that contains text has been digitalised it is often sent for Optical Character Recognition (OCR). This process involves computers “translating” pictures to text. This allows the user to search in the entire text, not just in the metadata. Analyses are also carried out to identify which parts of the text are headers, sections and similar.

This involves automatic validation and checking of files, generation of metadata, entering into databases, making files available for distribution and archiving in our digital archives. Printed material is stored in jpeg2000 or tiff. Radio material is stored as wav. For moving pictures we have several types of storage formats, including mpeg4 H264.

Digitizing lines

  • Newspapers
  • Pictures
  • Books
  • Sound
  • Manuscripts, handwritten documents, notes
  • Magazines
  • Videos/films
  • Reports to the Storting (Norwegian Parliament)

In addition to digitizing the physical collection, several other lines handle material that is “born digitally”, in so-called “reception lines”. For example we receive digital editions of newspapers and process these for archiving and distribution. The reception runs involve downloading, structure analysis and post-processing.

Reception runs for digital material

  • Newspapers
  • E-books
  • Public surveys
  • Sound books
  • Radio
  • TV
  • Web harvesting
samlingen nettsidene