<OAI-PMH xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.openarchives.org/OAI/2.0/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/          http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2026-07-03T12:36:29.987Z</responseDate>
  <request verb="GetRecord">https://www.nb.no/sprakbanken/oai</request>
  <GetRecord>
    <record>
      <header>
        <identifier>oai:nb.no:sbr-60</identifier>
        <datestamp/>
      </header>
      <metadata>
        <cmd:CMD xmlns:cmd="http://www.clarin.eu/cmd/1" xmlns="http://www.clarin.eu/cmd/" xmlns:cmdp="http://www.clarin.eu/cmd/1/profiles/clarin.eu:cr1:p_1407745711925" CMDVersion="1.2" xsi:schemaLocation="http://www.clarin.eu/cmd/1 https://infra.clarin.eu/CMDI/1.x/xsd/cmd-envelop.xsd http://www.clarin.eu/cmd/1/profiles/clarin.eu:cr1:p_1407745711925 https://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/1.1/profiles/clarin.eu:cr1:p_1407745711925/1.2/xsd">
          <cmd:Header>
            <cmd:MdCreator>nb:Nasjonalbiblioteket¦nn:Nasjonalbiblioteket¦en:National Library of Norway</cmd:MdCreator>
            <cmd:MdCreationDate>2021-06-24</cmd:MdCreationDate>
            <cmd:MdSelfLink>https://www.nb.no/sprakbanken/oai?verb=GetRecord&amp;identifier=oai:nb.no:sbr-60&amp;metadataPrefix=cmdi</cmd:MdSelfLink>
            <cmd:MdProfile>clarin.eu:cr1:p_1407745711925</cmd:MdProfile>
            <cmd:MdCollectionDisplayName>Språkbanken NB</cmd:MdCollectionDisplayName>
          </cmd:Header>
          <cmd:Resources>
            <cmd:ResourceProxyList>
              <cmd:ResourceProxy id="nno_sak_1">
                <cmd:ResourceType mimetype="application/x-gtar">Resource</cmd:ResourceType>
                <cmd:ResourceRef>https://www.nb.no/sbfil/tekst/sakspapir_nno/sakspapir_nno_01.tar.gz</cmd:ResourceRef>
              </cmd:ResourceProxy>
            </cmd:ResourceProxyList>
            <cmd:JournalFileProxyList/>
            <cmd:ResourceRelationList/>
          </cmd:Resources>
          <cmd:IsPartOfList/>
          <cmd:Components>
            <cmdp:corpusProfile>
              <cmdp:resourceCommonInfo>
                <cmdp:resourceType>corpus</cmdp:resourceType>
                <cmdp:identificationInfo>
                  <cmdp:resourceName xml:lang="en">Legal Documents from Norwegian Nynorsk Municipialities</cmdp:resourceName>
                  <cmdp:resourceName xml:lang="nn">Sakspapir frå nynorskkommunar</cmdp:resourceName>
                  <cmdp:description xml:lang="en">The texts in this corpus have been collected with the web crawler Veidemann in collaboration with the National Library's Web Archive, based on a revised list of municipalities from the National Association of Nynorsk Municipalities (see lnk.no).

The web crawler was set to download documents in pdf format. The resulting collection of documents was then scanned using Google's OCR API. Although the OCR generally is of high quality, some errors will remain in the material.

The resulting corpus is made up of 50,000 documents (legal documents, minutes from meetings etc.), and contains a total of some 127 million words. About 88.5 million of these are in Norwegian Nynorsk, the rest is mostly Norwegian Bokmål. All the texts in the corpus are classified by language.

The corpus is currently published as a json object, where the key is an identifier (URN) for the Veidemann download, and the value is a list of lists of pages in the document with associated page numbers and target form. A text file is also provided, containing a list of the URNs in the corpus. These URNs refer to the websites (URLs) from which the individual documents were downloaded.

The original pdf files and the OCR format are available upon request to Språkbanken. Please contact us using or e-mail address, sprakbanken@nb.no.</cmdp:description>
                  <cmdp:description xml:lang="nn">Tekstene i dette korpuset er samla inn med crawleren Veidemann i samarbeid med  Nettarkivet på Nasjonalbiblioteket, basert på ei omarbeidd liste over kommunar frå Landssamanslutninga av nynorskkommunar (lnk.no).

Ein crawler er ein robot som følgjer hyperlenkjer på nettet og lastar ned nye nettsider han finn. For dette korpuset vart Veidemann satt til å laste ned dokument i publiseringsformat som pdf. Lista Veidemann har teke som utgangspunkt, har leidd han til sakspapir på websidene til dei ulike kommunane.

Den resulterande samlinga med dokument er så skanna ved hjelp av Googles optiske teiknattkjennings-api. Sjølv om OCR-lesinga gjennomgåande er god, vil det finnast feillesingar. Det endelege korpuset er sett saman av 50.000 dokument, og inneheld totalt omlag 127 millionar ord. Ca. 88,5 millionar av desse er på nynorsk, resten er stort sett på bokmål. Alle tekstene i korpuset er klassifiserte etter språk.

Korpuset er i denne omgangen publisert som eit json-objekt, der nøkkelen er ein identifikator (URN) for Veidemann-nedlastinga og verdien er ei liste av lister over sidene i dokumentet med tilhøyrande sidetal og målform. Det ligg òg ved ei liste over URN-ane i korpuset. Desse URN-ane syner vidare til nettsida (URL-en) som dokumentet vart lasta ned frå.

Dei originale pdf-filene og ocr-formatet er tilgjengelege på førespurnad til Språkbanken. Kontakt oss på e-post til sprakbanken@nb.no.</cmdp:description>
                  <cmdp:url cmd:description="resource homepage">https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-60/</cmdp:url>
                  <cmdp:PID cmd:description="hdl">hdl:21.11146/60</cmdp:PID>
                  <cmdp:identifier>sbr-60</cmdp:identifier>
                </cmdp:identificationInfo>
                <cmdp:distributionInfo>
                  <cmdp:licenceInfo>
                    <cmdp:userCategory>Public</cmdp:userCategory>
                    <cmdp:distributionAccessMedium>downloadable</cmdp:distributionAccessMedium>
                    <cmdp:downloadLocation cmd:description="resource homepage">https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-60/</cmdp:downloadLocation>
                    <cmdp:licence>
                      <cmdp:licenceFamily>Creative Commons (CC)</cmdp:licenceFamily>
                      <cmdp:licenceName>Creative_Commons-ZERO (CC-ZERO)</cmdp:licenceName>
                      <cmdp:licenceURL>https://creativecommons.org/publicdomain/zero/1.0/</cmdp:licenceURL>
                    </cmdp:licence>
                    <cmdp:licensor>
                      <cmdp:actorInfo>
                        <cmdp:actorType>organization</cmdp:actorType>
                        <cmdp:role xml:lang="en">Licensor</cmdp:role>
                        <cmdp:organizationInfo>
                          <cmdp:organizationName xml:lang="en">National Library of Norway</cmdp:organizationName>
                          <cmdp:organizationName xml:lang="nn">Nasjonalbiblioteket</cmdp:organizationName>
                          <cmdp:organizationShortName xml:lang="en">NLN</cmdp:organizationShortName>
                          <cmdp:organizationShortName xml:lang="nn">NB</cmdp:organizationShortName>
                          <cmdp:departmentName xml:lang="en">The Language Bank</cmdp:departmentName>
                          <cmdp:departmentName xml:lang="nn">Språkbanken</cmdp:departmentName>
                        </cmdp:organizationInfo>
                        <cmdp:communicationInfo>
                          <cmdp:email>sprakbanken@nb.no</cmdp:email>
                          <cmdp:url>https://www.nb.no/sprakbanken/</cmdp:url>
                          <cmdp:address>P.O. Box 2674 Solli</cmdp:address>
                          <cmdp:zipCode>0203</cmdp:zipCode>
                          <cmdp:city>Oslo</cmdp:city>
                          <cmdp:region>Oslo</cmdp:region>
                          <cmdp:country>Norway</cmdp:country>
                        </cmdp:communicationInfo>
                      </cmdp:actorInfo>
                    </cmdp:licensor>
                    <cmdp:distributionRightsHolder>
                      <cmdp:actorInfo>
                        <cmdp:actorType>organization</cmdp:actorType>
                        <cmdp:role xml:lang="en">Distribution Rights Holder</cmdp:role>
                        <cmdp:organizationInfo>
                          <cmdp:organizationName xml:lang="en">National Library of Norway</cmdp:organizationName>
                          <cmdp:organizationName xml:lang="nn">Nasjonalbiblioteket</cmdp:organizationName>
                          <cmdp:organizationShortName xml:lang="en">NLN</cmdp:organizationShortName>
                          <cmdp:organizationShortName xml:lang="nn">NB</cmdp:organizationShortName>
                          <cmdp:departmentName xml:lang="en">The Language Bank</cmdp:departmentName>
                          <cmdp:departmentName xml:lang="nn">Språkbanken</cmdp:departmentName>
                        </cmdp:organizationInfo>
                        <cmdp:communicationInfo>
                          <cmdp:email>sprakbanken@nb.no</cmdp:email>
                          <cmdp:url>https://www.nb.no/sprakbanken/</cmdp:url>
                          <cmdp:address>P.O. Box 2674 Solli</cmdp:address>
                          <cmdp:zipCode>0203</cmdp:zipCode>
                          <cmdp:city>Oslo</cmdp:city>
                          <cmdp:region>Oslo</cmdp:region>
                          <cmdp:country>Norway</cmdp:country>
                        </cmdp:communicationInfo>
                      </cmdp:actorInfo>
                    </cmdp:distributionRightsHolder>
                  </cmdp:licenceInfo>
                </cmdp:distributionInfo>
                <cmdp:contact>
                  <cmdp:actorInfo>
                    <cmdp:actorType>organization</cmdp:actorType>
                    <cmdp:role xml:lang="en">Contact</cmdp:role>
                    <cmdp:organizationInfo>
                      <cmdp:organizationName xml:lang="en">National Library of Norway</cmdp:organizationName>
                      <cmdp:organizationName xml:lang="nn">Nasjonalbiblioteket</cmdp:organizationName>
                      <cmdp:organizationShortName xml:lang="en">NLN</cmdp:organizationShortName>
                      <cmdp:organizationShortName xml:lang="nn">NB</cmdp:organizationShortName>
                      <cmdp:departmentName xml:lang="en">The Language Bank</cmdp:departmentName>
                      <cmdp:departmentName xml:lang="nn">Språkbanken</cmdp:departmentName>
                    </cmdp:organizationInfo>
                    <cmdp:communicationInfo>
                      <cmdp:email>sprakbanken@nb.no</cmdp:email>
                      <cmdp:url>https://www.nb.no/sprakbanken/</cmdp:url>
                      <cmdp:address>P.O. Box 2674 Solli</cmdp:address>
                      <cmdp:zipCode>0203</cmdp:zipCode>
                      <cmdp:city>Oslo</cmdp:city>
                      <cmdp:region>Oslo</cmdp:region>
                      <cmdp:country>Norway</cmdp:country>
                    </cmdp:communicationInfo>
                  </cmdp:actorInfo>
                </cmdp:contact>
                <cmdp:metadataInfo>
                  <cmdp:metadataCreationDate>2020-12-04</cmdp:metadataCreationDate>
                  <cmdp:metadataLanguageName>English</cmdp:metadataLanguageName>
                  <cmdp:metadataLanguageId>en</cmdp:metadataLanguageId>
                  <cmdp:metadataLastDateUpdated>2023-08-07</cmdp:metadataLastDateUpdated>
                  <cmdp:metadataCreator>
                    <cmdp:actorInfo>
                      <cmdp:actorType>person</cmdp:actorType>
                      <cmdp:role xml:lang="en">Metadata Creator</cmdp:role>
                      <cmdp:personInfo>
                        <cmdp:surname xml:lang="nn">Lindstad</cmdp:surname>
                        <cmdp:givenName xml:lang="nn">Arne Martinus</cmdp:givenName>
                        <cmdp:affiliation>
                          <cmdp:organizationInfo>
                            <cmdp:organizationName xml:lang="en">National Library of Norway</cmdp:organizationName>
                            <cmdp:organizationName xml:lang="nn">Nasjonalbiblioteket</cmdp:organizationName>
                            <cmdp:organizationShortName xml:lang="en">NLN</cmdp:organizationShortName>
                            <cmdp:organizationShortName xml:lang="nn">NB</cmdp:organizationShortName>
                            <cmdp:departmentName xml:lang="en">The Language Bank</cmdp:departmentName>
                            <cmdp:departmentName xml:lang="nn">Språkbanken</cmdp:departmentName>
                          </cmdp:organizationInfo>
                        </cmdp:affiliation>
                      </cmdp:personInfo>
                      <cmdp:communicationInfo>
                        <cmdp:email>sprakbanken@nb.no</cmdp:email>
                        <cmdp:url>https://www.nb.no/sprakbanken/</cmdp:url>
                        <cmdp:address>P.O. Box 2674 Solli</cmdp:address>
                        <cmdp:zipCode>0203</cmdp:zipCode>
                        <cmdp:city>Oslo</cmdp:city>
                        <cmdp:region>Oslo</cmdp:region>
                        <cmdp:country>Norway</cmdp:country>
                      </cmdp:communicationInfo>
                    </cmdp:actorInfo>
                  </cmdp:metadataCreator>
                </cmdp:metadataInfo>
                <cmdp:versionInfo>
                  <cmdp:version>0.1</cmdp:version>
                  <cmdp:lastDateUpdated>2020-12-04</cmdp:lastDateUpdated>
                </cmdp:versionInfo>
                <cmdp:validationInfo>
                  <cmdp:validated>true</cmdp:validated>
                  <cmdp:validationType>content</cmdp:validationType>
                  <cmdp:validationMode>automatic</cmdp:validationMode>
                  <cmdp:validationModeDetails>OCR (Google's OCR API), Language Classification (pytextcat and models from Giellatekno)</cmdp:validationModeDetails>
                  <cmdp:validationExtent>full</cmdp:validationExtent>
                  <cmdp:validator>
                    <cmdp:actorInfo>
                      <cmdp:actorType>person</cmdp:actorType>
                      <cmdp:role xml:lang="en">Resource Validator</cmdp:role>
                      <cmdp:personInfo>
                        <cmdp:surname xml:lang="nn">Kåsen</cmdp:surname>
                        <cmdp:givenName xml:lang="nn">Andre</cmdp:givenName>
                        <cmdp:affiliation>
                          <cmdp:organizationInfo>
                            <cmdp:organizationName xml:lang="en">National Library of Norway</cmdp:organizationName>
                            <cmdp:organizationName xml:lang="nn">Nasjonalbiblioteket</cmdp:organizationName>
                            <cmdp:organizationShortName xml:lang="en">NLN</cmdp:organizationShortName>
                            <cmdp:organizationShortName xml:lang="nn">NB</cmdp:organizationShortName>
                            <cmdp:departmentName xml:lang="en">The Language Bank</cmdp:departmentName>
                            <cmdp:departmentName xml:lang="nn">Språkbanken</cmdp:departmentName>
                          </cmdp:organizationInfo>
                        </cmdp:affiliation>
                      </cmdp:personInfo>
                      <cmdp:communicationInfo>
                        <cmdp:email>sprakbanken@nb.no</cmdp:email>
                        <cmdp:url>https://www.nb.no/sprakbanken/</cmdp:url>
                        <cmdp:address>P.O. Box 2674 Solli</cmdp:address>
                        <cmdp:zipCode>0203</cmdp:zipCode>
                        <cmdp:city>Oslo</cmdp:city>
                        <cmdp:region>Oslo</cmdp:region>
                        <cmdp:country>Norway</cmdp:country>
                      </cmdp:communicationInfo>
                    </cmdp:actorInfo>
                  </cmdp:validator>
                </cmdp:validationInfo>
                <cmdp:resourceDocumentationInfo/>
                <cmdp:resourceCreationInfo>
                  <cmdp:creationStartDate>2019-10-16</cmdp:creationStartDate>
                  <cmdp:creationEndDate>2020-12-04</cmdp:creationEndDate>
                  <cmdp:resourceCreator>
                    <cmdp:actorInfo>
                      <cmdp:actorType>person</cmdp:actorType>
                      <cmdp:role xml:lang="en">Resource Creator</cmdp:role>
                      <cmdp:personInfo>
                        <cmdp:surname xml:lang="nn">Kåsen</cmdp:surname>
                        <cmdp:givenName xml:lang="nn">Andre</cmdp:givenName>
                        <cmdp:affiliation>
                          <cmdp:organizationInfo>
                            <cmdp:organizationName xml:lang="en">National Library of Norway</cmdp:organizationName>
                            <cmdp:organizationName xml:lang="nn">Nasjonalbiblioteket</cmdp:organizationName>
                            <cmdp:organizationShortName xml:lang="en">NLN</cmdp:organizationShortName>
                            <cmdp:organizationShortName xml:lang="nn">NB</cmdp:organizationShortName>
                            <cmdp:departmentName xml:lang="en">The Language Bank</cmdp:departmentName>
                            <cmdp:departmentName xml:lang="nn">Språkbanken</cmdp:departmentName>
                          </cmdp:organizationInfo>
                        </cmdp:affiliation>
                      </cmdp:personInfo>
                      <cmdp:communicationInfo>
                        <cmdp:email>sprakbanken@nb.no</cmdp:email>
                        <cmdp:url>https://www.nb.no/sprakbanken/</cmdp:url>
                        <cmdp:address>P.O. Box 2674 Solli</cmdp:address>
                        <cmdp:zipCode>0203</cmdp:zipCode>
                        <cmdp:city>Oslo</cmdp:city>
                        <cmdp:region>Oslo</cmdp:region>
                        <cmdp:country>Norway</cmdp:country>
                      </cmdp:communicationInfo>
                    </cmdp:actorInfo>
                    <cmdp:actorInfo>
                      <cmdp:actorType>organization</cmdp:actorType>
                      <cmdp:role xml:lang="en">Resource Creator</cmdp:role>
                      <cmdp:organizationInfo>
                        <cmdp:organizationName xml:lang="en">National Library of Norway</cmdp:organizationName>
                        <cmdp:organizationName xml:lang="nn">Nasjonalbiblioteket</cmdp:organizationName>
                        <cmdp:organizationShortName xml:lang="en">NLN</cmdp:organizationShortName>
                        <cmdp:organizationShortName xml:lang="nn">NB</cmdp:organizationShortName>
                        <cmdp:departmentName xml:lang="en">Web Archive</cmdp:departmentName>
                        <cmdp:departmentName xml:lang="nn">Nettarkivet</cmdp:departmentName>
                      </cmdp:organizationInfo>
                    </cmdp:actorInfo>
                  </cmdp:resourceCreator>
                </cmdp:resourceCreationInfo>
              </cmdp:resourceCommonInfo>
              <cmdp:corpusInfo>
                <cmdp:corpusType>Written Corpus</cmdp:corpusType>
                <cmdp:corpusPartInfo>
                  <cmdp:mediaType>text</cmdp:mediaType>
                  <cmdp:corpusTextInfo>
                    <cmdp:textFormatInfo>
                      <cmdp:mimeType>application/json</cmdp:mimeType>
                      <cmdp:sizePerTextFormat>
                        <cmdp:sizeInfo>
                          <cmdp:size>127476046</cmdp:size>
                          <cmdp:sizeUnit>words</cmdp:sizeUnit>
                        </cmdp:sizeInfo>
                        <cmdp:sizeInfo>
                          <cmdp:size>50000</cmdp:size>
                          <cmdp:sizeUnit>texts</cmdp:sizeUnit>
                        </cmdp:sizeInfo>
                      </cmdp:sizePerTextFormat>
                    </cmdp:textFormatInfo>
                    <cmdp:characterEncodingInfo>
                      <cmdp:characterEncoding>UTF-8</cmdp:characterEncoding>
                    </cmdp:characterEncodingInfo>
                  </cmdp:corpusTextInfo>
                </cmdp:corpusPartInfo>
                <cmdp:corpusPartGeneralInfo>
                  <cmdp:lingualityInfo>
                    <cmdp:lingualityType>multilingual</cmdp:lingualityType>
                    <cmdp:multilingualityType>multilingualSingleText</cmdp:multilingualityType>
                    <cmdp:multilingualityTypeDetails>Texts in Norwegian Nynorsk and Norwegian Bokmål</cmdp:multilingualityTypeDetails>
                  </cmdp:lingualityInfo>
                  <cmdp:languageInfo>
                    <cmdp:languageId>nn</cmdp:languageId>
                    <cmdp:languageName>Norwegian Nynorsk</cmdp:languageName>
                    <cmdp:sizePerLanguage>
                      <cmdp:sizeInfo>
                        <cmdp:size>88500000</cmdp:size>
                        <cmdp:sizeUnit>words</cmdp:sizeUnit>
                      </cmdp:sizeInfo>
                    </cmdp:sizePerLanguage>
                  </cmdp:languageInfo>
                  <cmdp:languageInfo>
                    <cmdp:languageId>nb</cmdp:languageId>
                    <cmdp:languageName>Norwegian Bokmål</cmdp:languageName>
                    <cmdp:sizePerLanguage>
                      <cmdp:sizeInfo>
                        <cmdp:size>38500000</cmdp:size>
                        <cmdp:sizeUnit>words</cmdp:sizeUnit>
                      </cmdp:sizeInfo>
                    </cmdp:sizePerLanguage>
                  </cmdp:languageInfo>
                  <cmdp:modalityInfo>
                    <cmdp:modalityType>writtenLanguage</cmdp:modalityType>
                    <cmdp:sizePerModality>
                      <cmdp:sizeInfo>
                        <cmdp:size>127476046</cmdp:size>
                        <cmdp:sizeUnit>words</cmdp:sizeUnit>
                      </cmdp:sizeInfo>
                    </cmdp:sizePerModality>
                  </cmdp:modalityInfo>
                  <cmdp:sizeInfo>
                    <cmdp:size>127476046</cmdp:size>
                    <cmdp:sizeUnit>words</cmdp:sizeUnit>
                  </cmdp:sizeInfo>
                  <cmdp:timeCoverageInfo>
                    <cmdp:timeCoverage>2010-2020</cmdp:timeCoverage>
                  </cmdp:timeCoverageInfo>
                </cmdp:corpusPartGeneralInfo>
              </cmdp:corpusInfo>
            </cmdp:corpusProfile>
          </cmd:Components>
        </cmd:CMD>
      </metadata>
    </record>
  </GetRecord>
</OAI-PMH>