{"id":1429,"date":"2024-09-02T14:58:42","date_gmt":"2024-09-02T12:58:42","guid":{"rendered":"https:\/\/www.nb.no\/en\/?post_type=collection&#038;p=1429"},"modified":"2024-11-06T13:18:57","modified_gmt":"2024-11-06T12:18:57","slug":"web-news-corpus","status":"publish","type":"collection","link":"https:\/\/www.nb.no\/en\/collection\/web-archive\/research\/web-news-corpus\/","title":{"rendered":"Web News Collection"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"536\" src=\"https:\/\/www.nb.no\/content\/uploads\/sites\/26\/2024\/09\/dhlab-korpusanalyse-1024x536.jpg\" alt=\"Word galaxy from dhlab, illustrating corpus of text\" class=\"wp-image-1439\" srcset=\"https:\/\/www.nb.no\/content\/uploads\/sites\/26\/2024\/09\/dhlab-korpusanalyse-1024x536.jpg 1024w, https:\/\/www.nb.no\/content\/uploads\/sites\/26\/2024\/09\/dhlab-korpusanalyse-300x157.jpg 300w, https:\/\/www.nb.no\/content\/uploads\/sites\/26\/2024\/09\/dhlab-korpusanalyse-768x402.jpg 768w, https:\/\/www.nb.no\/content\/uploads\/sites\/26\/2024\/09\/dhlab-korpusanalyse-1536x804.jpg 1536w, https:\/\/www.nb.no\/content\/uploads\/sites\/26\/2024\/09\/dhlab-korpusanalyse.jpg 1920w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>In collaboration with <a href=\"https:\/\/www.nb.no\/dh-lab\/\">DH-lab<\/a>, the Norwegian Web Archive has created a collection of texts from web news publications from 2019-22. These texts are available for computational analysis through DH-lab&#8217;s API. <\/p>\n\n\n\n<p>The objective is to allow scholars, students and others to make their own corpora of web news texts, facilitating digital text analysis of web news.<\/p>\n\n\n\n<p>We are working to develop notebooks and user-friendly web apps to interact with the data. For now, you can find examples of use in <a href=\"https:\/\/github.com\/nlnwa\/nlnwa-notebooks\/blob\/main\/notebooks\/corpus\/nettavis-tekstanalyse.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">nettavis-tekstanalyse.ipynb<\/a>.<\/p>\n\n\n\n<p>Below, you will find some basic information and metadata about the Web News Collection. Please contact us at <a>nettarkivet@nb.no<\/a> if you have any questions!<\/p>\n\n\n<div class=\"t2-accordion wp-block-t2-accordion\" data-allow-multiple=\"1\" role=\"list\">\n<div class=\"t2-accordion-item wp-block-t2-accordion-item\" role=\"listitem\">\n\t\t\t<h2 class=\"t2-accordion-title\">\n\t\t\t\t<button type=\"button\" class=\"t2-accordion-trigger\" id=\"what-is-collections-as-data?\" aria-controls=\"accordion-panel\" aria-expanded=\"false\">\n\t\t\t\t\tWhat is Collections as Data?\n\t\t\t\t\t<span class=\"t2-accordion-icon is-closed-icon\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 24 24\" height=\"24\" width=\"24\" class=\"t2-icon t2-icon-add\" aria-hidden=\"true\" focusable=\"false\"><path fill=\"currentColor\" d=\"M11.7 4.817a.761.761 0 0 0-.384.399c-.07.156-.074.328-.075 3.353l-.001 3.19-3.253.01-3.253.011-.148.1c-.168.114-.346.433-.345.618.002.263.243.605.492.699.076.029 1.148.043 3.31.043h3.197l.001 3.19c.001 3.02.005 3.199.075 3.35.286.619 1.082.619 1.368 0 .07-.151.074-.33.075-3.35l.001-3.19h3.197c2.162 0 3.234-.014 3.31-.043.249-.093.491-.438.491-.697 0-.259-.242-.604-.491-.697-.076-.029-1.148-.043-3.31-.043H12.76l-.001-3.19c-.001-3.02-.005-3.199-.075-3.35-.086-.186-.257-.357-.417-.417a.894.894 0 0 0-.567.014\"\/><\/svg><\/span>\n\t\t\t\t\t<span class=\"t2-accordion-icon is-open-icon\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 24 24\" height=\"24\" width=\"24\" class=\"t2-icon t2-icon-remove\" aria-hidden=\"true\" focusable=\"false\"><path d=\"M6 13c-.285156 0-.519531-.097656-.710938-.289062C5.097656 12.519531 5 12.285156 5 12c0-.285156.097656-.519531.289062-.710938C5.480469 11.097656 5.714844 11 6 11h12c.285156 0 .519531.097656.710938.289062C18.902344 11.480469 19 11.714844 19 12c0 .285156-.097656.519531-.289062.710938C18.519531 12.902344 18.285156 13 18 13Zm0 0\" fill=\"currentColor\" \/><\/svg><\/span>\n\t\t\t\t<\/button>\n\t\t\t<\/h2>\n\t\t\t<div class=\"t2-accordion-item__inner-container is-layout-flow\" id=\"accordion-panel\" aria-labelledby=\"what-is-collections-as-data?\" role=\"region\" hidden>\n\t\t\t\t\n\n<p>&#8220;Collections as Data&#8221; means that we provide content from the web archive in a format that supports computational analysis. This allows researchers to explore and analyse trends and shifts in archived web data.<\/p>\n\n\n\n<p>You can learn more from the initiative <a href=\"https:\/\/collectionsasdata.github.io\/\"><em>Always Already Computational: Collections as Data<\/em><\/a>.<\/p>\n\n\n\t\t\t<\/div>\n\t\t<\/div>\n\n<div class=\"t2-accordion-item wp-block-t2-accordion-item\" role=\"listitem\">\n\t\t\t<h2 class=\"t2-accordion-title\">\n\t\t\t\t<button type=\"button\" class=\"t2-accordion-trigger\" id=\"how-big-is-the-web-news-collection?\" aria-controls=\"accordion-panel-1\" aria-expanded=\"false\">\n\t\t\t\t\tHow big is the Web News Collection?\n\t\t\t\t\t<span class=\"t2-accordion-icon is-closed-icon\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 24 24\" height=\"24\" width=\"24\" class=\"t2-icon t2-icon-add\" aria-hidden=\"true\" focusable=\"false\"><path fill=\"currentColor\" d=\"M11.7 4.817a.761.761 0 0 0-.384.399c-.07.156-.074.328-.075 3.353l-.001 3.19-3.253.01-3.253.011-.148.1c-.168.114-.346.433-.345.618.002.263.243.605.492.699.076.029 1.148.043 3.31.043h3.197l.001 3.19c.001 3.02.005 3.199.075 3.35.286.619 1.082.619 1.368 0 .07-.151.074-.33.075-3.35l.001-3.19h3.197c2.162 0 3.234-.014 3.31-.043.249-.093.491-.438.491-.697 0-.259-.242-.604-.491-.697-.076-.029-1.148-.043-3.31-.043H12.76l-.001-3.19c-.001-3.02-.005-3.199-.075-3.35-.086-.186-.257-.357-.417-.417a.894.894 0 0 0-.567.014\"\/><\/svg><\/span>\n\t\t\t\t\t<span class=\"t2-accordion-icon is-open-icon\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 24 24\" height=\"24\" width=\"24\" class=\"t2-icon t2-icon-remove\" aria-hidden=\"true\" focusable=\"false\"><path d=\"M6 13c-.285156 0-.519531-.097656-.710938-.289062C5.097656 12.519531 5 12.285156 5 12c0-.285156.097656-.519531.289062-.710938C5.480469 11.097656 5.714844 11 6 11h12c.285156 0 .519531.097656.710938.289062C18.902344 11.480469 19 11.714844 19 12c0 .285156-.097656.519531-.289062.710938C18.519531 12.902344 18.285156 13 18 13Zm0 0\" fill=\"currentColor\" \/><\/svg><\/span>\n\t\t\t\t<\/button>\n\t\t\t<\/h2>\n\t\t\t<div class=\"t2-accordion-item__inner-container is-layout-flow\" id=\"accordion-panel-1\" aria-labelledby=\"how-big-is-the-web-news-collection?\" role=\"region\" hidden>\n\t\t\t\t\n\n<p>The first version of the collection contains texts from 2019-22:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>1,572,655 texts<\/li>\n\n\n\n<li>784,171,966 words<\/li>\n\n\n\n<li>268 publication titles<\/li>\n<\/ul>\n\n\n\t\t\t<\/div>\n\t\t<\/div>\n\n<div class=\"t2-accordion-item wp-block-t2-accordion-item\" role=\"listitem\">\n\t\t\t<h2 class=\"t2-accordion-title\">\n\t\t\t\t<button type=\"button\" class=\"t2-accordion-trigger\" id=\"which-languages-are-in-the-collection?\" aria-controls=\"accordion-panel-2\" aria-expanded=\"false\">\n\t\t\t\t\tWhich languages are in the collection?\n\t\t\t\t\t<span class=\"t2-accordion-icon is-closed-icon\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 24 24\" height=\"24\" width=\"24\" class=\"t2-icon t2-icon-add\" aria-hidden=\"true\" focusable=\"false\"><path fill=\"currentColor\" d=\"M11.7 4.817a.761.761 0 0 0-.384.399c-.07.156-.074.328-.075 3.353l-.001 3.19-3.253.01-3.253.011-.148.1c-.168.114-.346.433-.345.618.002.263.243.605.492.699.076.029 1.148.043 3.31.043h3.197l.001 3.19c.001 3.02.005 3.199.075 3.35.286.619 1.082.619 1.368 0 .07-.151.074-.33.075-3.35l.001-3.19h3.197c2.162 0 3.234-.014 3.31-.043.249-.093.491-.438.491-.697 0-.259-.242-.604-.491-.697-.076-.029-1.148-.043-3.31-.043H12.76l-.001-3.19c-.001-3.02-.005-3.199-.075-3.35-.086-.186-.257-.357-.417-.417a.894.894 0 0 0-.567.014\"\/><\/svg><\/span>\n\t\t\t\t\t<span class=\"t2-accordion-icon is-open-icon\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 24 24\" height=\"24\" width=\"24\" class=\"t2-icon t2-icon-remove\" aria-hidden=\"true\" focusable=\"false\"><path d=\"M6 13c-.285156 0-.519531-.097656-.710938-.289062C5.097656 12.519531 5 12.285156 5 12c0-.285156.097656-.519531.289062-.710938C5.480469 11.097656 5.714844 11 6 11h12c.285156 0 .519531.097656.710938.289062C18.902344 11.480469 19 11.714844 19 12c0 .285156-.097656.519531-.289062.710938C18.519531 12.902344 18.285156 13 18 13Zm0 0\" fill=\"currentColor\" \/><\/svg><\/span>\n\t\t\t\t<\/button>\n\t\t\t<\/h2>\n\t\t\t<div class=\"t2-accordion-item__inner-container is-layout-flow\" id=\"accordion-panel-2\" aria-labelledby=\"which-languages-are-in-the-collection?\" role=\"region\" hidden>\n\t\t\t\t\n\n<p>The collection includes texts in various languages. The most frequent ones are:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Norwegian Bokm\u00e5l: 1,437,768 texts<\/li>\n\n\n\n<li>Norwegian Nynorsk: 111,892 texts<\/li>\n\n\n\n<li>Northern Sami: 11,416 texts<\/li>\n\n\n\n<li>Kven: 302 texts<\/li>\n\n\n\n<li>Southern Sami: 101 texts<\/li>\n\n\n\n<li>Lule Sami: 78 texts<\/li>\n<\/ul>\n\n\n\t\t\t<\/div>\n\t\t<\/div>\n\n<div class=\"t2-accordion-item wp-block-t2-accordion-item\" role=\"listitem\">\n\t\t\t<h2 class=\"t2-accordion-title\">\n\t\t\t\t<button type=\"button\" class=\"t2-accordion-trigger\" id=\"which-publication-titles-are-in-the-collection?\" aria-controls=\"accordion-panel-3\" aria-expanded=\"false\">\n\t\t\t\t\tWhich publication titles are in the collection?\n\t\t\t\t\t<span class=\"t2-accordion-icon is-closed-icon\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 24 24\" height=\"24\" width=\"24\" class=\"t2-icon t2-icon-add\" aria-hidden=\"true\" focusable=\"false\"><path fill=\"currentColor\" d=\"M11.7 4.817a.761.761 0 0 0-.384.399c-.07.156-.074.328-.075 3.353l-.001 3.19-3.253.01-3.253.011-.148.1c-.168.114-.346.433-.345.618.002.263.243.605.492.699.076.029 1.148.043 3.31.043h3.197l.001 3.19c.001 3.02.005 3.199.075 3.35.286.619 1.082.619 1.368 0 .07-.151.074-.33.075-3.35l.001-3.19h3.197c2.162 0 3.234-.014 3.31-.043.249-.093.491-.438.491-.697 0-.259-.242-.604-.491-.697-.076-.029-1.148-.043-3.31-.043H12.76l-.001-3.19c-.001-3.02-.005-3.199-.075-3.35-.086-.186-.257-.357-.417-.417a.894.894 0 0 0-.567.014\"\/><\/svg><\/span>\n\t\t\t\t\t<span class=\"t2-accordion-icon is-open-icon\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 24 24\" height=\"24\" width=\"24\" class=\"t2-icon t2-icon-remove\" aria-hidden=\"true\" focusable=\"false\"><path d=\"M6 13c-.285156 0-.519531-.097656-.710938-.289062C5.097656 12.519531 5 12.285156 5 12c0-.285156.097656-.519531.289062-.710938C5.480469 11.097656 5.714844 11 6 11h12c.285156 0 .519531.097656.710938.289062C18.902344 11.480469 19 11.714844 19 12c0 .285156-.097656.519531-.289062.710938C18.519531 12.902344 18.285156 13 18 13Zm0 0\" fill=\"currentColor\" \/><\/svg><\/span>\n\t\t\t\t<\/button>\n\t\t\t<\/h2>\n\t\t\t<div class=\"t2-accordion-item__inner-container is-layout-flow\" id=\"accordion-panel-3\" aria-labelledby=\"which-publication-titles-are-in-the-collection?\" role=\"region\" hidden>\n\t\t\t\t\n\n<h4 class=\"wp-block-heading\" id=\"h-in-total-the-collection-includes-texts-from-268-publications-with-a-responsible-editor-the-most-frequent-titles-are\">In total, the collection includes texts from 268 publications with a responsible editor. The most frequent titles are:<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>NRK: 130 162<\/li>\n\n\n\n<li>VG: 66 800<\/li>\n\n\n\n<li>Forskning.no: 65 469<\/li>\n\n\n\n<li>TV2: 55 367<\/li>\n\n\n\n<li>Dagens n\u00e6ringsliv: 50 005<\/li>\n\n\n\n<li>Dagbladet: 46 333<\/li>\n\n\n\n<li>Finansavisen: 38 514<\/li>\n\n\n\n<li>Adresseavisen: 33 640<\/li>\n\n\n\n<li>Aftenposten: 31 075<\/li>\n\n\n\n<li>Khrono: 29 794<\/li>\n\n\n\n<li>Hamar Arbeiderblad: 29 775<\/li>\n\n\n\n<li>Dagsavisen: 27 009<\/li>\n\n\n\n<li>ABC Nyheter: 25 690<\/li>\n\n\n\n<li>E24: 24 930<\/li>\n\n\n\n<li>Nettavisen: 23 670<\/li>\n<\/ul>\n\n\n\t\t\t<\/div>\n\t\t<\/div>\n\n<div class=\"t2-accordion-item wp-block-t2-accordion-item\" role=\"listitem\">\n\t\t\t<h2 class=\"t2-accordion-title\">\n\t\t\t\t<button type=\"button\" class=\"t2-accordion-trigger\" id=\"how-can-i-work-with-the-collection?\" aria-controls=\"accordion-panel-4\" aria-expanded=\"false\">\n\t\t\t\t\tHow can I work with the collection?\n\t\t\t\t\t<span class=\"t2-accordion-icon is-closed-icon\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 24 24\" height=\"24\" width=\"24\" class=\"t2-icon t2-icon-add\" aria-hidden=\"true\" focusable=\"false\"><path fill=\"currentColor\" d=\"M11.7 4.817a.761.761 0 0 0-.384.399c-.07.156-.074.328-.075 3.353l-.001 3.19-3.253.01-3.253.011-.148.1c-.168.114-.346.433-.345.618.002.263.243.605.492.699.076.029 1.148.043 3.31.043h3.197l.001 3.19c.001 3.02.005 3.199.075 3.35.286.619 1.082.619 1.368 0 .07-.151.074-.33.075-3.35l.001-3.19h3.197c2.162 0 3.234-.014 3.31-.043.249-.093.491-.438.491-.697 0-.259-.242-.604-.491-.697-.076-.029-1.148-.043-3.31-.043H12.76l-.001-3.19c-.001-3.02-.005-3.199-.075-3.35-.086-.186-.257-.357-.417-.417a.894.894 0 0 0-.567.014\"\/><\/svg><\/span>\n\t\t\t\t\t<span class=\"t2-accordion-icon is-open-icon\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 24 24\" height=\"24\" width=\"24\" class=\"t2-icon t2-icon-remove\" aria-hidden=\"true\" focusable=\"false\"><path d=\"M6 13c-.285156 0-.519531-.097656-.710938-.289062C5.097656 12.519531 5 12.285156 5 12c0-.285156.097656-.519531.289062-.710938C5.480469 11.097656 5.714844 11 6 11h12c.285156 0 .519531.097656.710938.289062C18.902344 11.480469 19 11.714844 19 12c0 .285156-.097656.519531-.289062.710938C18.519531 12.902344 18.285156 13 18 13Zm0 0\" fill=\"currentColor\" \/><\/svg><\/span>\n\t\t\t\t<\/button>\n\t\t\t<\/h2>\n\t\t\t<div class=\"t2-accordion-item__inner-container is-layout-flow\" id=\"accordion-panel-4\" aria-labelledby=\"how-can-i-work-with-the-collection?\" role=\"region\" hidden>\n\t\t\t\t\n\n<h4 class=\"wp-block-heading\" id=\"h-to-work-with-the-collection-you-can-choose-between-the-dhlab-package-for-python-and-easy-to-use-webapps-from-the-dh-lab\">To work with the collection, you can choose between the <a href=\"https:\/\/nationallibraryofnorway.github.io\/digital_tekstanalyse\/tutorial\/2.0.Bygg_korpus.html\">dhlab-package for python<\/a> and <a href=\"https:\/\/www.nb.no\/dh-lab\/apper\/\">easy-to-use webapps<\/a> from the DH-lab.<\/h4>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-for-apps-there-are-currently-limited-support-for-the-web-news-corpus-corpus-building-getting-concordances-getting-collocations-and-calculate-relative-frequency-of-collocated-words\">For apps, there are currently limited support for the Web News Corpus:<br>Corpus building, getting concordances, getting collocations and calculate relative frequency of collocated words.<\/h4>\n\n\n\t\t\t<\/div>\n\t\t<\/div>\n\n<div class=\"t2-accordion-item wp-block-t2-accordion-item\" role=\"listitem\">\n\t\t\t<h2 class=\"t2-accordion-title\">\n\t\t\t\t<button type=\"button\" class=\"t2-accordion-trigger\" id=\"which-schema-attributes-can-be-used-with-the-api?\" aria-controls=\"accordion-panel-5\" aria-expanded=\"false\">\n\t\t\t\t\tWhich schema-attributes can be used with the API?\n\t\t\t\t\t<span class=\"t2-accordion-icon is-closed-icon\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 24 24\" height=\"24\" width=\"24\" class=\"t2-icon t2-icon-add\" aria-hidden=\"true\" focusable=\"false\"><path fill=\"currentColor\" d=\"M11.7 4.817a.761.761 0 0 0-.384.399c-.07.156-.074.328-.075 3.353l-.001 3.19-3.253.01-3.253.011-.148.1c-.168.114-.346.433-.345.618.002.263.243.605.492.699.076.029 1.148.043 3.31.043h3.197l.001 3.19c.001 3.02.005 3.199.075 3.35.286.619 1.082.619 1.368 0 .07-.151.074-.33.075-3.35l.001-3.19h3.197c2.162 0 3.234-.014 3.31-.043.249-.093.491-.438.491-.697 0-.259-.242-.604-.491-.697-.076-.029-1.148-.043-3.31-.043H12.76l-.001-3.19c-.001-3.02-.005-3.199-.075-3.35-.086-.186-.257-.357-.417-.417a.894.894 0 0 0-.567.014\"\/><\/svg><\/span>\n\t\t\t\t\t<span class=\"t2-accordion-icon is-open-icon\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 24 24\" height=\"24\" width=\"24\" class=\"t2-icon t2-icon-remove\" aria-hidden=\"true\" focusable=\"false\"><path d=\"M6 13c-.285156 0-.519531-.097656-.710938-.289062C5.097656 12.519531 5 12.285156 5 12c0-.285156.097656-.519531.289062-.710938C5.480469 11.097656 5.714844 11 6 11h12c.285156 0 .519531.097656.710938.289062C18.902344 11.480469 19 11.714844 19 12c0 .285156-.097656.519531-.289062.710938C18.519531 12.902344 18.285156 13 18 13Zm0 0\" fill=\"currentColor\" \/><\/svg><\/span>\n\t\t\t\t<\/button>\n\t\t\t<\/h2>\n\t\t\t<div class=\"t2-accordion-item__inner-container is-layout-flow\" id=\"accordion-panel-5\" aria-labelledby=\"which-schema-attributes-can-be-used-with-the-api?\" role=\"region\" hidden>\n\t\t\t\t\n\n<h4 class=\"wp-block-heading\" id=\"h-here-is-an-overview-of-the-schema-attributes-that-can-be-used-with-the-api-using-a-text-from-aftenposten-as-an-example\">Here is an overview of the schema attributes that can be used with the API, using a text from Aftenposten as an example:<\/h4>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>schema:properties<\/strong><\/td><td><strong>dtype<\/strong><\/td><td><strong>description<\/strong><\/td><td><strong>example<\/strong><\/td><\/tr><tr><td>doctype<\/td><td>str<\/td><td>nettavis<\/td><td>nettavis<\/td><\/tr><tr><td>dhlabid<\/td><td>int<\/td><td>unique id for text object<\/td><td>600274473<\/td><\/tr><tr><td>title<\/td><td>str<\/td><td>publication title<\/td><td>Aftenposten<\/td><\/tr><tr><td>publisher<\/td><td>int<\/td><td>domain name<\/td><td>aftenposten.no<\/td><\/tr><tr><td>city<\/td><td>str<\/td><td>place of editor<\/td><td>Oslo<\/td><\/tr><tr><td>lang<\/td><td>str<\/td><td>ISO 639-2<\/td><td>nob<\/td><\/tr><tr><td>oaiid<\/td><td>str<\/td><td>target-uri<\/td><td>https:\/\/www.aftenposten.no:443\/norge\/politikk\/i\/&#8230;<\/td><\/tr><tr><td>timestamp<\/td><td>int<\/td><td>YYYYMMDD (date for crawling)<\/td><td>20200526<\/td><\/tr><tr><td>ocr_timestamp<\/td><td>int<\/td><td>YYYYMMDD (date for text extraction)<\/td><td>20220820<\/td><\/tr><tr><td>urn<\/td><td>str<\/td><td>WARC-Record-ID<\/td><td>&lt;urn:uuid:b01b7ad0-c5c3-4b2e-ab30-8d9bddf8c312&gt;<\/td><\/tr><tr><td>year<\/td><td>int<\/td><td>YYYY (year of crawl)<\/td><td>2020<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\t\t\t<\/div>\n\t\t<\/div>\n\n<div class=\"t2-accordion-item wp-block-t2-accordion-item\" role=\"listitem\">\n\t\t\t<h2 class=\"t2-accordion-title\">\n\t\t\t\t<button type=\"button\" class=\"t2-accordion-trigger\" id=\"how-do-i-cite-the-web-news-collection?\" aria-controls=\"accordion-panel-6\" aria-expanded=\"false\">\n\t\t\t\t\tHow do I cite the Web News Collection?\n\t\t\t\t\t<span class=\"t2-accordion-icon is-closed-icon\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 24 24\" height=\"24\" width=\"24\" class=\"t2-icon t2-icon-add\" aria-hidden=\"true\" focusable=\"false\"><path fill=\"currentColor\" d=\"M11.7 4.817a.761.761 0 0 0-.384.399c-.07.156-.074.328-.075 3.353l-.001 3.19-3.253.01-3.253.011-.148.1c-.168.114-.346.433-.345.618.002.263.243.605.492.699.076.029 1.148.043 3.31.043h3.197l.001 3.19c.001 3.02.005 3.199.075 3.35.286.619 1.082.619 1.368 0 .07-.151.074-.33.075-3.35l.001-3.19h3.197c2.162 0 3.234-.014 3.31-.043.249-.093.491-.438.491-.697 0-.259-.242-.604-.491-.697-.076-.029-1.148-.043-3.31-.043H12.76l-.001-3.19c-.001-3.02-.005-3.199-.075-3.35-.086-.186-.257-.357-.417-.417a.894.894 0 0 0-.567.014\"\/><\/svg><\/span>\n\t\t\t\t\t<span class=\"t2-accordion-icon is-open-icon\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 24 24\" height=\"24\" width=\"24\" class=\"t2-icon t2-icon-remove\" aria-hidden=\"true\" focusable=\"false\"><path d=\"M6 13c-.285156 0-.519531-.097656-.710938-.289062C5.097656 12.519531 5 12.285156 5 12c0-.285156.097656-.519531.289062-.710938C5.480469 11.097656 5.714844 11 6 11h12c.285156 0 .519531.097656.710938.289062C18.902344 11.480469 19 11.714844 19 12c0 .285156-.097656.519531-.289062.710938C18.519531 12.902344 18.285156 13 18 13Zm0 0\" fill=\"currentColor\" \/><\/svg><\/span>\n\t\t\t\t<\/button>\n\t\t\t<\/h2>\n\t\t\t<div class=\"t2-accordion-item__inner-container is-layout-flow\" id=\"accordion-panel-6\" aria-labelledby=\"how-do-i-cite-the-web-news-collection?\" role=\"region\" hidden>\n\t\t\t\t\n\n<h4 class=\"wp-block-heading\" id=\"h-you-can-cite-the-web-news-collection-as-a-data-set-according-to-different-citation-styles-apa-7-national-library-of-norway-2024-web-news-collection-version-1-data-set-sqlite-and-json-api-available-through-the-dh-lab-api-http-api-nb-no-dhlab-chicago-manual-of-style-17th-national-library-of-norway-web-news-collection-sqlite-and-json-api-data-set-2024-available-through-the-dh-lab-api-http-api-nb-no-dhlab-you-can-also-download-citations-as-ris-zotero-and-xml-endnote-and-metadata-according-to-dublin-core-and-d-cat\">You can cite the Web News Collection as a data set according to different citation styles:<br><br>APA 7:<br>National Library of Norway. (2024). <em>Web News Collection<\/em> (Version 1) [Data set; SQLite and JSON (API)]. Available through the DH-lab API. <a href=\"http:\/\/api.nb.no\/dhlab\/\">http:\/\/api.nb.no\/dhlab\/<\/a><br><br>Chicago Manual of Style (17th):<br>National Library of Norway. \u2018Web News Collection\u2019. SQLite and JSON (API), Data set, 2024. Available through the DH-lab API. <a href=\"http:\/\/api.nb.no\/dhlab\/\">http:\/\/api.nb.no\/dhlab\/<\/a>.<br><br>You can also download citations as <a href=\"https:\/\/raw.githubusercontent.com\/nlnwa\/nlnwa-notebooks\/refs\/heads\/main\/notebooks\/corpus\/metadata\/webNewsCollection.ris\">.RIS (Zotero)<\/a> and <a href=\"https:\/\/github.com\/nlnwa\/nlnwa-notebooks\/blob\/main\/notebooks\/corpus\/metadata\/webNewsCollection.xml\">.xml (EndNote)<\/a>, and metadata according to <a href=\"https:\/\/github.com\/nlnwa\/nlnwa-notebooks\/blob\/main\/notebooks\/corpus\/metadata\/webNewsCollection_dcore.xml\">Dublin Core<\/a> and <a href=\"https:\/\/github.com\/nlnwa\/nlnwa-notebooks\/blob\/main\/notebooks\/corpus\/metadata\/DCAT-Metadata.jsonl\">D-CAT<\/a>.<\/h4>\n\n\n\t\t\t<\/div>\n\t\t<\/div>\n<\/div>\n\n\n<p><\/p>\n","protected":false},"author":794,"featured_media":1431,"parent":863,"menu_order":0,"template":"","meta":{"_acf_changed":false,"nb_hide_title":false,"nb_breadcrumb_title_override":"","sub_navigation_option":"","sub_navigation_custom_items":[],"card_link":[],"savage_label":"manual","savage_label_text":"","savage_title":"","savage_link_title":"","card_background":"default","savage_image_type":"featured"},"class_list":["post-1429","collection","type-collection","status-publish","has-post-thumbnail","hentry"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.1 (Yoast SEO v27.1.1) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Web News Collection - National Library of Norway<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.nb.no\/en\/collection\/web-archive\/research\/web-news-corpus\/\" \/>\n<meta property=\"og:locale\" content=\"en_GB\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Web News Collection\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.nb.no\/en\/collection\/web-archive\/research\/web-news-corpus\/\" \/>\n<meta property=\"og:site_name\" content=\"National Library of Norway\" \/>\n<meta property=\"article:modified_time\" content=\"2024-11-06T12:18:57+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.nb.no\/content\/uploads\/sites\/26\/2024\/01\/Skjermbilde-2024-09-02-kl.-12.35.35.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1209\" \/>\n\t<meta property=\"og:image:height\" content=\"557\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Estimated reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.nb.no\/en\/collection\/web-archive\/research\/web-news-corpus\/\",\"url\":\"https:\/\/www.nb.no\/en\/collection\/web-archive\/research\/web-news-corpus\/\",\"name\":\"Web News Collection - National Library of Norway\",\"isPartOf\":{\"@id\":\"https:\/\/www.nb.no\/en\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.nb.no\/en\/collection\/web-archive\/research\/web-news-corpus\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.nb.no\/en\/collection\/web-archive\/research\/web-news-corpus\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.nb.no\/content\/uploads\/sites\/26\/2024\/01\/Skjermbilde-2024-09-02-kl.-12.35.35.png\",\"datePublished\":\"2024-09-02T12:58:42+00:00\",\"dateModified\":\"2024-11-06T12:18:57+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.nb.no\/en\/collection\/web-archive\/research\/web-news-corpus\/#breadcrumb\"},\"inLanguage\":\"en-GB\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.nb.no\/en\/collection\/web-archive\/research\/web-news-corpus\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"https:\/\/www.nb.no\/en\/collection\/web-archive\/research\/web-news-corpus\/#primaryimage\",\"url\":\"https:\/\/www.nb.no\/content\/uploads\/sites\/26\/2024\/01\/Skjermbilde-2024-09-02-kl.-12.35.35.png\",\"contentUrl\":\"https:\/\/www.nb.no\/content\/uploads\/sites\/26\/2024\/01\/Skjermbilde-2024-09-02-kl.-12.35.35.png\",\"width\":1209,\"height\":557,\"caption\":\"Example of notebook illustrating digital text analysis of web news corpora\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.nb.no\/en\/collection\/web-archive\/research\/web-news-corpus\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.nb.no\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Norwegian Web Archive\",\"item\":\"https:\/\/www.nb.no\/en\/collection\/web-archive\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Research\",\"item\":\"https:\/\/www.nb.no\/en\/collection\/web-archive\/research\/\"},{\"@type\":\"ListItem\",\"position\":4,\"name\":\"Web News Collection\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.nb.no\/en\/#website\",\"url\":\"https:\/\/www.nb.no\/en\/\",\"name\":\"National Library of Norway\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.nb.no\/en\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-GB\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Web News Collection - National Library of Norway","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.nb.no\/en\/collection\/web-archive\/research\/web-news-corpus\/","og_locale":"en_GB","og_type":"article","og_title":"Web News Collection","og_url":"https:\/\/www.nb.no\/en\/collection\/web-archive\/research\/web-news-corpus\/","og_site_name":"National Library of Norway","article_modified_time":"2024-11-06T12:18:57+00:00","og_image":[{"width":1209,"height":557,"url":"https:\/\/www.nb.no\/content\/uploads\/sites\/26\/2024\/01\/Skjermbilde-2024-09-02-kl.-12.35.35.png","type":"image\/png"}],"twitter_card":"summary_large_image","twitter_misc":{"Estimated reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.nb.no\/en\/collection\/web-archive\/research\/web-news-corpus\/","url":"https:\/\/www.nb.no\/en\/collection\/web-archive\/research\/web-news-corpus\/","name":"Web News Collection - National Library of Norway","isPartOf":{"@id":"https:\/\/www.nb.no\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.nb.no\/en\/collection\/web-archive\/research\/web-news-corpus\/#primaryimage"},"image":{"@id":"https:\/\/www.nb.no\/en\/collection\/web-archive\/research\/web-news-corpus\/#primaryimage"},"thumbnailUrl":"https:\/\/www.nb.no\/content\/uploads\/sites\/26\/2024\/01\/Skjermbilde-2024-09-02-kl.-12.35.35.png","datePublished":"2024-09-02T12:58:42+00:00","dateModified":"2024-11-06T12:18:57+00:00","breadcrumb":{"@id":"https:\/\/www.nb.no\/en\/collection\/web-archive\/research\/web-news-corpus\/#breadcrumb"},"inLanguage":"en-GB","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.nb.no\/en\/collection\/web-archive\/research\/web-news-corpus\/"]}]},{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/www.nb.no\/en\/collection\/web-archive\/research\/web-news-corpus\/#primaryimage","url":"https:\/\/www.nb.no\/content\/uploads\/sites\/26\/2024\/01\/Skjermbilde-2024-09-02-kl.-12.35.35.png","contentUrl":"https:\/\/www.nb.no\/content\/uploads\/sites\/26\/2024\/01\/Skjermbilde-2024-09-02-kl.-12.35.35.png","width":1209,"height":557,"caption":"Example of notebook illustrating digital text analysis of web news corpora"},{"@type":"BreadcrumbList","@id":"https:\/\/www.nb.no\/en\/collection\/web-archive\/research\/web-news-corpus\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.nb.no\/en\/"},{"@type":"ListItem","position":2,"name":"Norwegian Web Archive","item":"https:\/\/www.nb.no\/en\/collection\/web-archive\/"},{"@type":"ListItem","position":3,"name":"Research","item":"https:\/\/www.nb.no\/en\/collection\/web-archive\/research\/"},{"@type":"ListItem","position":4,"name":"Web News Collection"}]},{"@type":"WebSite","@id":"https:\/\/www.nb.no\/en\/#website","url":"https:\/\/www.nb.no\/en\/","name":"National Library of Norway","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.nb.no\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-GB"}]}},"_links":{"self":[{"href":"https:\/\/www.nb.no\/en\/wp-json\/wp\/v2\/collection\/1429","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.nb.no\/en\/wp-json\/wp\/v2\/collection"}],"about":[{"href":"https:\/\/www.nb.no\/en\/wp-json\/wp\/v2\/types\/collection"}],"author":[{"embeddable":true,"href":"https:\/\/www.nb.no\/en\/wp-json\/wp\/v2\/users\/794"}],"up":[{"embeddable":true,"href":"https:\/\/www.nb.no\/en\/wp-json\/wp\/v2\/collection\/863"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.nb.no\/en\/wp-json\/wp\/v2\/media\/1431"}],"wp:attachment":[{"href":"https:\/\/www.nb.no\/en\/wp-json\/wp\/v2\/media?parent=1429"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}