<OAI-PMH xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.openarchives.org/OAI/2.0/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/          http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2026-05-16T06:49:52.713Z</responseDate>
  <request verb="GetRecord">https://www.nb.no/sprakbanken/oai</request>
  <GetRecord>
    <record>
      <header>
        <identifier>oai:nb.no:sbr-101</identifier>
        <datestamp/>
      </header>
      <metadata>
        <CMD xmlns="http://www.clarin.eu/cmd/" CMDVersion="1.1" xsi:schemaLocation="http://www.clarin.eu/cmd/ http://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/profiles/clarin.eu:cr1:p_1562754657363/xsd">
          <Header>
            <MdCreator>Arne Martinus Lindstad</MdCreator>
            <MdCreationDate>2025-01-28</MdCreationDate>
            <MdSelfLink>https://www.nb.no/sprakbanken/oai?verb=GetRecord&amp;identifier=oai:nb.no:sbr-101&amp;metadataPrefix=cmdi</MdSelfLink>
            <MdProfile>clarin.eu:cr1:p_1562754657363</MdProfile>
            <MdCollectionDisplayName>Språkbanken NB</MdCollectionDisplayName>
          </Header>
          <Resources>
            <ResourceProxyList>
              <ResourceProxy id="sami_img_ocr">
                <ResourceType mimetype="application/zip">Resource</ResourceType>
                <ResourceRef>https://www.nb.no/sbfil/samisk_ocr/syntetisk_data/parquet_files.zip</ResourceRef>
              </ResourceProxy>
              <ResourceProxy id="sami_img_ocr_nob">
                <ResourceType mimetype="application/pdf">Resource</ResourceType>
                <ResourceRef>https://www.nb.no/sbfil/samisk_ocr/syntetisk_data/README_nob.pdf</ResourceRef>
              </ResourceProxy>
              <ResourceProxy id="sami_img_ocr_eng">
                <ResourceType mimetype="application/pdf">Resource</ResourceType>
                <ResourceRef>https://www.nb.no/sbfil/samisk_ocr/syntetisk_data/README_eng.pdf</ResourceRef>
              </ResourceProxy>
            </ResourceProxyList>
            <JournalFileProxyList/>
            <ResourceRelationList>
              <ResourceRelation>
                <RelationType>describes</RelationType>
                <Res1 ref="sami_img_ocr_nob"/>
                <Res2 ref="sami_img_ocr"/>
              </ResourceRelation>
              <ResourceRelation>
                <RelationType>describes</RelationType>
                <Res1 ref="sami_img_ocr_eng"/>
                <Res2 ref="sami_img_ocr"/>
              </ResourceRelation>
            </ResourceRelationList>
            <IsPartOfList/>
          </Resources>
          <Components>
            <toolProfile>
              <resourceCommonInfo ComponentId="clarin.eu:cr1:c_1396012485126">
                <resourceType>toolService</resourceType>
                <identificationInfo ComponentId="clarin.eu:cr1:c_1396012485125">
                  <resourceName xml:lang="nb">Syntetiske tekstbilder for nord-, sør-, lule- og inaresamisk</resourceName>
                  <resourceName xml:lang="en">Synthetic text images for North, South, Lule and Inare Sámi</resourceName>
                  <description xml:lang="nb">Dette datasettet inneholder syntetiske linjebilder som kan brukes til å finjustere OCR-modeller for nord-, sør-, lule- og inaresamisk. Fremgangsmåten for å lage disse bildene er å lage 'rene' linjebilder og tilføre støy ved hjelp av Augraphy.

Teksten i datasettet kommer fra Giellatekno sitt korpus.

Datasettet er tilfeldig delt opp slik at 71% av filene (307387 linjer) er i treningsdelen, 9% av filene (40765 linjer) er i valideringsdelen og 20% av filene er i (84534 linjer) testdelen. Hver del har en unik mengde skrifttyper og tekst- og bakgrunnsfarger.

Se dokumentasjonsfilen for mer informasjon.</description>
                  <description xml:lang="en">This dataset contains synthetic line images meant for fitting OCR models for North, South, Lule and Inari Sámi. Clean line images are created using Pillow and they are subsequently distorted using Augraphy.

The text in this dataset comes from Giellatekno's corpus.

The dataset is split randomly by file so 71 % of the files (307387 lines) are in the training split, 9 % of the files (40765 lines) are in the validation split and 20 % of the files (84534 lines) are in the test split. Each split has a unique set of typefaces and text/background colors.
|
See the documentation file for more information.</description>
                  <url description="resource homepage">https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-101/</url>
                  <PID description="handle">hdl:21.11146/101</PID>
                  <identifier>sbr-101</identifier>
                </identificationInfo>
                <distributionInfo ComponentId="clarin.eu:cr1:c_1396012485124">
                  <licenceInfo ComponentId="clarin.eu:cr1:c_1396012485158">
                    <userCategory>Public</userCategory>
                    <distributionAccessMedium>downloadable</distributionAccessMedium>
                    <downloadLocation description="resource homepage">https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-101/</downloadLocation>
                    <attributionText xml:lang="en">Please cite

1. Enstad T, Trosterud T, Røsok MI, Beyer Y, Roald M. 'Comparative analysis of optical character recognition methods for Sámi texts from the National Library of Norway.' Accepted for publication in Proceedings of the 25th Nordic Conference on Computational Linguistics (NoDaLiDa) 2025, https://arxiv.org/abs/2501.07300.

2. SIKOR UiT The Arctic University of Norway and the Norwegian Saami Parliament's Saami text collection, http://gtweb.uit.no/korp, Version 01.12.2021 [Data set]. (Also note that the SIKOR dataset to get Sámi text for the images is CC-BY 3.0 licensed.)</attributionText>
                    <licence ComponentId="clarin.eu:cr1:c_1447674760330">
                      <licenceFamily>Creative Commons (CC)</licenceFamily>
                      <licenceName>Creative_Commons-BY (CC-BY)</licenceName>
                      <licenceURL>https://creativecommons.org/licenses/by/3.0/</licenceURL>
                      <conditionsOfUse>BY</conditionsOfUse>
                    </licence>
                    <licensor>
                      <actorInfo ComponentId="clarin.eu:cr1:c_1396012485194">
                        <actorType>organization</actorType>
                        <role xml:lang="en">Licensor</role>
                        <organizationInfo ComponentId="clarin.eu:cr1:c_1407745711883">
                          <organizationName xml:lang="nb">Nasjonalbiblioteket</organizationName>
                          <organizationName xml:lang="en">National Library of Norway</organizationName>
                          <organizationShortName xml:lang="nb">NB</organizationShortName>
                          <organizationShortName xml:lang="en">NLN</organizationShortName>
                          <departmentName xml:lang="nb">Språkbanken</departmentName>
                          <departmentName xml:lang="en">The Language Bank</departmentName>
                        </organizationInfo>
                        <communicationInfo ComponentId="clarin.eu:cr1:c_1352813745460">
                          <email>sprakbanken@nb.no</email>
                          <url>https://www.nb.no/sprakbanken/</url>
                        </communicationInfo>
                      </actorInfo>
                    </licensor>
                  </licenceInfo>
                </distributionInfo>
                <contact>
                  <actorInfo ComponentId="clarin.eu:cr1:c_1396012485194">
                    <actorType>organization</actorType>
                    <role xml:lang="en">Contact</role>
                    <organizationInfo ComponentId="clarin.eu:cr1:c_1407745711883">
                      <organizationName xml:lang="nb">Nasjonalbiblioteket</organizationName>
                      <organizationName xml:lang="en">National Library of Norway</organizationName>
                      <organizationShortName xml:lang="nb">NB</organizationShortName>
                      <organizationShortName xml:lang="en">NLN</organizationShortName>
                      <departmentName xml:lang="nb">Språkbanken</departmentName>
                      <departmentName xml:lang="en">The Language Bank</departmentName>
                    </organizationInfo>
                    <communicationInfo ComponentId="clarin.eu:cr1:c_1352813745460">
                      <email>sprakbanken@nb.no</email>
                      <url>https://www.nb.no/sprakbanken/</url>
                    </communicationInfo>
                  </actorInfo>
                </contact>
                <metadataInfo ComponentId="clarin.eu:cr1:c_1407745711922">
                  <metadataCreationDate>2025-01-28</metadataCreationDate>
                  <metadataLanguageName>Norwegian Bokmål</metadataLanguageName>
                  <metadataLanguageName>English</metadataLanguageName>
                  <metadataLanguageId>nb</metadataLanguageId>
                  <metadataLanguageId>en</metadataLanguageId>
                  <metadataLastDateUpdated>2025-01-28</metadataLastDateUpdated>
                  <metadataCreator>
                    <actorInfo ComponentId="clarin.eu:cr1:c_1396012485194">
                      <actorType>organization</actorType>
                      <role xml:lang="en">Metadata Creator</role>
                      <organizationInfo ComponentId="clarin.eu:cr1:c_1407745711883">
                        <organizationName xml:lang="nb">Nasjonalbiblioteket</organizationName>
                        <organizationName xml:lang="en">National Library of Norway</organizationName>
                        <organizationShortName xml:lang="nb">NB</organizationShortName>
                        <organizationShortName xml:lang="en">NLN</organizationShortName>
                        <departmentName xml:lang="nb">Språkbanken</departmentName>
                        <departmentName xml:lang="en">The Language Bank</departmentName>
                      </organizationInfo>
                      <communicationInfo ComponentId="clarin.eu:cr1:c_1352813745460">
                        <email>sprakbanken@nb.no</email>
                        <url>https://www.nb.no/sprakbanken/</url>
                      </communicationInfo>
                    </actorInfo>
                  </metadataCreator>
                </metadataInfo>
                <resourceCreationInfo ComponentId="clarin.eu:cr1:c_1407745711921">
                  <creationStartDate>2024-10-01</creationStartDate>
                  <creationEndDate>2025-01-28</creationEndDate>
                  <resourceCreator>
                    <actorInfo ComponentId="clarin.eu:cr1:c_1396012485194">
                      <actorType>organization</actorType>
                      <role xml:lang="en">Resource Creator</role>
                      <organizationInfo ComponentId="clarin.eu:cr1:c_1407745711883">
                        <organizationName xml:lang="nb">Nasjonalbiblioteket</organizationName>
                        <organizationName xml:lang="en">National Library of Norway</organizationName>
                        <organizationShortName xml:lang="nb">NB</organizationShortName>
                        <organizationShortName xml:lang="en">NLN</organizationShortName>
                        <departmentName xml:lang="nb">Språkbanken</departmentName>
                        <departmentName xml:lang="en">The Language Bank</departmentName>
                      </organizationInfo>
                      <communicationInfo ComponentId="clarin.eu:cr1:c_1352813745460">
                        <email>sprakbanken@nb.no</email>
                        <url>https://www.nb.no/sprakbanken/</url>
                      </communicationInfo>
                    </actorInfo>
                  </resourceCreator>
                </resourceCreationInfo>
              </resourceCommonInfo>
              <toolInfo ComponentId="clarin.eu:cr1:c_1562754657362">
                <description>Synthetic text images for Sámi Languages</description>
                <inputInfo ComponentId="clarin.eu:cr1:c_1360931019804">
                  <mediaType>image</mediaType>
                </inputInfo>
                <outputInfo ComponentId="clarin.eu:cr1:c_1360931019824">
                  <mediaType>text</mediaType>
                </outputInfo>
                <Service ComponentId="clarin.eu:cr1:c_1505397653787" validation-error="Validation error: CoreVersion: A value has to be specified.">
                  <Name>Synthetic images for Sámi Languages</Name>
                  <ServiceDescriptionLocation>
                    <Location>https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-101/</Location>
                  </ServiceDescriptionLocation>
                  <Operations>
                    <Operation ComponentId="clarin.eu:cr1:c_1299509410080">
                      <Name>OCR</Name>
                      <Output>
                        <ParameterGroup ComponentId="clarin.eu:cr1:c_1302702320471">
                          <Parameters>
                            <Parameter ComponentId="clarin.eu:cr1:c_1299509410079"/>
                          </Parameters>
                        </ParameterGroup>
                      </Output>
                    </Operation>
                  </Operations>
                </Service>
              </toolInfo>
            </toolProfile>
          </Components>
        </CMD>
      </metadata>
    </record>
  </GetRecord>
</OAI-PMH>