Page MenuHomeSoftware Heritage

Add a Python script to migrate extrinsic metadata from revision metadata.
ClosedPublic

Authored by vlorentz on Aug 20 2020, 3:24 PM.

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Build is green

Patch application report for D3820 (id=13534)

Rebasing onto 4532a4dc64...

First, rewinding head to replay your work on top of it...
Applying: [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.
Applying: add tests for revisions generated by the debian loader.
Applying: cran: handle date
Applying: cran: add test
Applying: Start adding the origin context
Applying: pypi: add test
Applying: Rename original-artifact-json to original-artifacts-json.
Applying: npm format 2: build origins urls.
Applying: npm format 2: fix format of original_artifact.
Applying: npm: add tests
Applying: cran: add support for revisions with date
Applying: tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.
Applying: debian: rewrite original_artifacts to the current format.
Applying: pypi: detect origin from format 2, and fix format of original_artifacts.
Applying: explicitly error if the loader type could not be detected.
Applying: cran: detect package name when 'provider' does not have the right format.
Applying: pypi: improve origin detection from filename.
Applying: pypi: add support for another format
Applying: pypi: get rid of the package name heuristic, it's unreliable.
Applying: When available, use metadata['when'] as discovery date instead of the revision date.
Applying: gnu: add tests.
Applying: nixguix: add test.
Changes applied before test
commit bcedc264b2d31ffb93376c8bfedb27312edb5e0f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:50:41 2020 +0200

    nixguix: add test.

commit 493ac37f016c25f284a12233bc90f01c76bea492
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:43:09 2020 +0200

    gnu: add tests.

commit 1b2092e7722948b39c7109df36338033b4b4c850
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:42:17 2020 +0200

    When available, use metadata['when'] as discovery date instead of the revision date.
    
    revision date is when the date was updated, not the date it was seen.

commit f11b0a5defbeb84fd21dd3fb3d979e88a422d0b7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:08:43 2020 +0200

    pypi: get rid of the package name heuristic, it's unreliable.

commit 43621f402f58344979e54dd6d9f8e953f21dc536
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 15:12:36 2020 +0200

    pypi: add support for another format

commit 4d0316a9b1dd2747903e250fa5f53d4368629f2b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:54:30 2020 +0200

    pypi: improve origin detection from filename.

commit 0e86216ee93734e9076f42f0d566b56ebfbd2d3a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:49:15 2020 +0200

    cran: detect package name when 'provider' does not have the right format.

commit 5a99b61a4c71cd68feee61ae5995969b39e75804
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:39:07 2020 +0200

    explicitly error if the loader type could not be detected.

commit 754fd5f80d3575e74c0f82d14177507cf50d5ab2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:38:38 2020 +0200

    pypi: detect origin from format 2, and fix format of original_artifacts.

commit 5148875127c0ab5736d26c80c0736ccadf36eb5d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:51:20 2020 +0200

    debian: rewrite original_artifacts to the current format.

commit 6bb0761635c56d0bae806e772d42c20f2066e665
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:03:38 2020 +0200

    tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.

commit 6780865ef01fd40bef19752d396bb2f38040ad30
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:36:19 2020 +0200

    cran: add support for revisions with date

commit 5d4d7429b1d5e64bee1a6e667776cc3f83e0f24c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:43 2020 +0200

    npm: add tests

commit 5d79aa70a44da55f345e5beadb5d6217aafc9004
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:31 2020 +0200

    npm format 2: fix format of original_artifact.

commit 585340c740321535f4ae33916cfe0f60f116e4fb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:18:56 2020 +0200

    npm format 2: build origins urls.

commit e5ef4125b3947d51f287345ce3cc6928f388a0da
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:17:47 2020 +0200

    Rename original-artifact-json to original-artifacts-json.
    
    As in the core loader.

commit b753acbaa3775dac5ed88c1bd6e8f9069eeb84d1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:05:09 2020 +0200

    pypi: add test

commit 98b03c6c50810fe2bc94914f33d148311a4e944e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:04:53 2020 +0200

    Start adding the origin context

commit 29c2c03c46dc17aafc23b3d7cdde11188c48f9c5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:52 2020 +0200

    cran: add test

commit b539c890ad63258a7589b6b7a04bf27aa6efc8a2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:34 2020 +0200

    cran: handle date

commit bdb4a88022f3c480593610f7841a835ee9cec652
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 15:57:44 2020 +0200

    add tests for revisions generated by the debian loader.

commit ad33aeb4f53741ff7eed5514d79accaf3e701324
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 20 15:24:28 2020 +0200

    [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/874/ for more details.

vlorentz updated this revision to Diff 13593.Aug 26 2020, 8:04 PM
  • Fix crash when original_artifact is missing an url.
  • cran: improve package name detection.
  • deposit: add tests.
  • deposit: add support for revisions with no metadata.
  • deduplicate origin checking.
  • deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.

Build has FAILED

Patch application report for D3820 (id=13593)

Rebasing onto 4532a4dc64...

First, rewinding head to replay your work on top of it...
Applying: [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.
Applying: add tests for revisions generated by the debian loader.
Applying: cran: handle date
Applying: cran: add test
Applying: Start adding the origin context
Applying: pypi: add test
Applying: Rename original-artifact-json to original-artifacts-json.
Applying: npm format 2: build origins urls.
Applying: npm format 2: fix format of original_artifact.
Applying: npm: add tests
Applying: cran: add support for revisions with date
Applying: tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.
Applying: debian: rewrite original_artifacts to the current format.
Applying: pypi: detect origin from format 2, and fix format of original_artifacts.
Applying: explicitly error if the loader type could not be detected.
Applying: cran: detect package name when 'provider' does not have the right format.
Applying: pypi: improve origin detection from filename.
Applying: pypi: add support for another format
Applying: pypi: get rid of the package name heuristic, it's unreliable.
Applying: When available, use metadata['when'] as discovery date instead of the revision date.
Applying: gnu: add tests.
Applying: nixguix: add test.
Applying: Fix crash when original_artifact is missing an url.
Applying: cran: improve package name detection.
Applying: deposit: add tests.
Applying: deposit: add support for revisions with no metadata.
Applying: deduplicate origin checking.
Applying: deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.
Changes applied before test
commit 8a0eee2a9ba888414f593bbfba1e424494d19a4b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:04:29 2020 +0200

    deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.

commit 48a87418090029b88e856f694a56ed5b697e3429
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:51:27 2020 +0200

    deduplicate origin checking.

commit 3be19bc383d13f8588f82ec583e33e41535c24ec
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:50:28 2020 +0200

    deposit: add support for revisions with no metadata.

commit b90a0d9fd7b4fb0359b5eaf500a0e84a5bbe7592
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:48:39 2020 +0200

    deposit: add tests.

commit 42e10586f280904a7f03bdcef56a0deae08bf282
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 09:51:06 2020 +0200

    cran: improve package name detection.

commit f537670a9dbd74738e8b9441541978e434568385
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:51:15 2020 +0200

    Fix crash when original_artifact is missing an url.

commit 66983ae9bb1cfa6de09114b3a3f7aaae5beb1aab
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:50:41 2020 +0200

    nixguix: add test.

commit b5cf39989661481d45b31a8ce5e4bba79d0db751
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:43:09 2020 +0200

    gnu: add tests.

commit d3615067ffa0e4271fe172bb2875f761535400cc
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:42:17 2020 +0200

    When available, use metadata['when'] as discovery date instead of the revision date.
    
    revision date is when the date was updated, not the date it was seen.

commit b31546338407d4fd8afc27ace0571fe40f7dcd8e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:08:43 2020 +0200

    pypi: get rid of the package name heuristic, it's unreliable.

commit ee89aed4e58120d29018f6a47213242ff1926103
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 15:12:36 2020 +0200

    pypi: add support for another format

commit 2ffc8560dbd151e11e384a50398173c9993e14ac
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:54:30 2020 +0200

    pypi: improve origin detection from filename.

commit 28a8cdcc266f595a4f09c8e316d34a129e88fcc2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:49:15 2020 +0200

    cran: detect package name when 'provider' does not have the right format.

commit 9cdc4e5d4a0b678e2430860eea396d3248f958e3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:39:07 2020 +0200

    explicitly error if the loader type could not be detected.

commit fc24c32eeec72b8009dc188d02309f4bf423d7c4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:38:38 2020 +0200

    pypi: detect origin from format 2, and fix format of original_artifacts.

commit ac7d0b3b42261c5a5dc822d365a2f9c4a03646fe
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:51:20 2020 +0200

    debian: rewrite original_artifacts to the current format.

commit edd8884f977259f58bea1d2164056adbbc98b009
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:03:38 2020 +0200

    tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.

commit 54855fe911d472590709c113c27d2236d764079c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:36:19 2020 +0200

    cran: add support for revisions with date

commit c05b475d45cbdb56fe7cf7bfe875e2cce1727d8b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:43 2020 +0200

    npm: add tests

commit c9d9b073f9b63697916d5c5573b90f7e20495313
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:31 2020 +0200

    npm format 2: fix format of original_artifact.

commit 0d582b9a2ce79a56d3b126d09e173d960e0a9ea6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:18:56 2020 +0200

    npm format 2: build origins urls.

commit 42e3e7e7752bb8120ec26b6fc440c56e674c60c9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:17:47 2020 +0200

    Rename original-artifact-json to original-artifacts-json.
    
    As in the core loader.

commit f82169dd5b231d14d3c77a58c7d47f95d90290a5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:05:09 2020 +0200

    pypi: add test

commit af16f11453185e5973f35b697ad75c89be5bde3a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:04:53 2020 +0200

    Start adding the origin context

commit 41c0b034d21387ffda699faa747c1fc4a21768d3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:52 2020 +0200

    cran: add test

commit 60cba11f7066bd6915fdcfb1f2cb9807458e9f12
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:34 2020 +0200

    cran: handle date

commit aa6cbb712d6a0db7a86bf9299bc116eedb7b65bb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 15:57:44 2020 +0200

    add tests for revisions generated by the debian loader.

commit 30061e5730b2af48bc7d130734aa73517529820e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 20 15:24:28 2020 +0200

    [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/877/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/877/console

vlorentz updated this revision to Diff 13594.Aug 26 2020, 8:27 PM
  • deposit: fix tests

Build is green

Patch application report for D3820 (id=13594)

Rebasing onto 4532a4dc64...

First, rewinding head to replay your work on top of it...
Applying: [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.
Applying: add tests for revisions generated by the debian loader.
Applying: cran: handle date
Applying: cran: add test
Applying: Start adding the origin context
Applying: pypi: add test
Applying: Rename original-artifact-json to original-artifacts-json.
Applying: npm format 2: build origins urls.
Applying: npm format 2: fix format of original_artifact.
Applying: npm: add tests
Applying: cran: add support for revisions with date
Applying: tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.
Applying: debian: rewrite original_artifacts to the current format.
Applying: pypi: detect origin from format 2, and fix format of original_artifacts.
Applying: explicitly error if the loader type could not be detected.
Applying: cran: detect package name when 'provider' does not have the right format.
Applying: pypi: improve origin detection from filename.
Applying: pypi: add support for another format
Applying: pypi: get rid of the package name heuristic, it's unreliable.
Applying: When available, use metadata['when'] as discovery date instead of the revision date.
Applying: gnu: add tests.
Applying: nixguix: add test.
Applying: Fix crash when original_artifact is missing an url.
Applying: cran: improve package name detection.
Applying: deposit: add tests.
Applying: deposit: add support for revisions with no metadata.
Applying: deduplicate origin checking.
Applying: deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.
Applying: deposit: fix tests
Changes applied before test
commit 72d9e4fdfa76cc2557e698483ca20ac06bce405e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:27:20 2020 +0200

    deposit: fix tests

commit 8bfc1ab96752f0a401e3012769aabe72e7a3b6d6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:04:29 2020 +0200

    deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.

commit 2c567a8a986b94cc1e2298a72cbd25682cc6e870
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:51:27 2020 +0200

    deduplicate origin checking.

commit 215ce71a4e434d5e54bf273978cc5f3cff8f3353
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:50:28 2020 +0200

    deposit: add support for revisions with no metadata.

commit 8a0d5ce0ebfbad31a977e467ae46029416df5343
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:48:39 2020 +0200

    deposit: add tests.

commit cf9e68d7eb8b9fa7513377c683403b6f84acda84
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 09:51:06 2020 +0200

    cran: improve package name detection.

commit 54fdf1e198eaed34e245542a433346bc9de178ce
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:51:15 2020 +0200

    Fix crash when original_artifact is missing an url.

commit cbdf2402b257d76db8e62fc167141353af31b549
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:50:41 2020 +0200

    nixguix: add test.

commit 754e61c53835019f455ee6e69544bfd1f810e485
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:43:09 2020 +0200

    gnu: add tests.

commit 0b8fe81284d7b08e3db53b2f51b326849dd573eb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:42:17 2020 +0200

    When available, use metadata['when'] as discovery date instead of the revision date.
    
    revision date is when the date was updated, not the date it was seen.

commit d2e7161e69d1e04d36489a4457332d2acc726ff4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:08:43 2020 +0200

    pypi: get rid of the package name heuristic, it's unreliable.

commit 4bd58ab607a1d41f3c38e623b6d14e2da55aa5b9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 15:12:36 2020 +0200

    pypi: add support for another format

commit d2713d3526e0fa4c035295f4094c8afaa7c89477
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:54:30 2020 +0200

    pypi: improve origin detection from filename.

commit 316e05a7eab630e2ca803eb05aaf6e9c5b3709d7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:49:15 2020 +0200

    cran: detect package name when 'provider' does not have the right format.

commit f3b1d74cae927ebcd2a0c6835542747f64d1eca3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:39:07 2020 +0200

    explicitly error if the loader type could not be detected.

commit 9ba71c05cedc0a6185fb3e8d3b60966127ed3e43
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:38:38 2020 +0200

    pypi: detect origin from format 2, and fix format of original_artifacts.

commit 1d4eebeb3ff875fc34710586bdcf1f7eb1880bec
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:51:20 2020 +0200

    debian: rewrite original_artifacts to the current format.

commit ae6a41db875cf573f877dfa4bd03d97de459504a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:03:38 2020 +0200

    tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.

commit eaf56488332ac00c2cdc3f4a5b5ce27ddf9dc490
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:36:19 2020 +0200

    cran: add support for revisions with date

commit d1bbe85ea9645247a93477a82887f4f5c492e381
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:43 2020 +0200

    npm: add tests

commit 6ba70fd965866c5c0eeffb603d121fd5b5cac36c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:31 2020 +0200

    npm format 2: fix format of original_artifact.

commit 6d80c1cff3c8cf95d8f845c60a3b1a2411d78ca8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:18:56 2020 +0200

    npm format 2: build origins urls.

commit 3c47778bb23dc53c35f97b790d1f9048bee4a12f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:17:47 2020 +0200

    Rename original-artifact-json to original-artifacts-json.
    
    As in the core loader.

commit ed9365c48425afb3f6b58c0d2b0ba8d2fe2d822c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:05:09 2020 +0200

    pypi: add test

commit 7d6dddf2cfe75eb6c726038ed5651c5139e4d184
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:04:53 2020 +0200

    Start adding the origin context

commit 880ab6abae21de9865e4e6c88135470cd95794a7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:52 2020 +0200

    cran: add test

commit 721275e1f8f01344b6c6a742f08c60d6572bb3a0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:34 2020 +0200

    cran: handle date

commit 2ab716f8fd52ba0636eee3d589b6fde2327f580d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 15:57:44 2020 +0200

    add tests for revisions generated by the debian loader.

commit aed6541c462cb775b11aa3ff27a4bae003aeff77
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 20 15:24:28 2020 +0200

    [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/878/ for more details.

vlorentz updated this revision to Diff 13595.Aug 26 2020, 10:41 PM
  • deposit: Change authority type from FORGE to DEPOSIT_CLIENT.
  • deposit: use external_id from the HTTP header instead of the metadata.

Build is green

Patch application report for D3820 (id=13595)

Rebasing onto 4532a4dc64...

First, rewinding head to replay your work on top of it...
Applying: [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.
Applying: add tests for revisions generated by the debian loader.
Applying: cran: handle date
Applying: cran: add test
Applying: Start adding the origin context
Applying: pypi: add test
Applying: Rename original-artifact-json to original-artifacts-json.
Applying: npm format 2: build origins urls.
Applying: npm format 2: fix format of original_artifact.
Applying: npm: add tests
Applying: cran: add support for revisions with date
Applying: tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.
Applying: debian: rewrite original_artifacts to the current format.
Applying: pypi: detect origin from format 2, and fix format of original_artifacts.
Applying: explicitly error if the loader type could not be detected.
Applying: cran: detect package name when 'provider' does not have the right format.
Applying: pypi: improve origin detection from filename.
Applying: pypi: add support for another format
Applying: pypi: get rid of the package name heuristic, it's unreliable.
Applying: When available, use metadata['when'] as discovery date instead of the revision date.
Applying: gnu: add tests.
Applying: nixguix: add test.
Applying: Fix crash when original_artifact is missing an url.
Applying: cran: improve package name detection.
Applying: deposit: add tests.
Applying: deposit: add support for revisions with no metadata.
Applying: deduplicate origin checking.
Applying: deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.
Applying: deposit: fix tests
Applying: deposit: Change authority type from FORGE to DEPOSIT_CLIENT.
Applying: deposit: use external_id from the HTTP header instead of the metadata.
Changes applied before test
commit a5beedc44e168a188a01d77d8c418d3854ed06bb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:40:53 2020 +0200

    deposit: use external_id from the HTTP header instead of the metadata.
    
    The origin URL is created from the one in the Slug header, and the one
    in the metadata file may be wrong.

commit 4f20e1b31b04576870afc8c460b037874002b9c7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:39:53 2020 +0200

    deposit: Change authority type from FORGE to DEPOSIT_CLIENT.

commit 4d04a11118476924914c014c60d87f96cd41652e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:27:20 2020 +0200

    deposit: fix tests

commit 833061d8bac4d7b4faecf6d3fa21ef47f7ab4119
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:04:29 2020 +0200

    deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.

commit d664b75129b9073955dd727f8f8e139a3c8d11a5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:51:27 2020 +0200

    deduplicate origin checking.

commit 7822c0d081486d39f62c010af0533b29bf6c050b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:50:28 2020 +0200

    deposit: add support for revisions with no metadata.

commit c388ba42410afefd8caaaca1990226c6737cffc4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:48:39 2020 +0200

    deposit: add tests.

commit 76f5ae051cc0600e3e767ceba67458015d6f224d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 09:51:06 2020 +0200

    cran: improve package name detection.

commit c433f246887e6ac0f2ca74530e0ac9132bdf1e29
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:51:15 2020 +0200

    Fix crash when original_artifact is missing an url.

commit 302183ab0f4818a18a6da97cff4cf52e8837f1c7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:50:41 2020 +0200

    nixguix: add test.

commit fcdef41c411440cda1aca9a27af1f03e4ee3f6a9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:43:09 2020 +0200

    gnu: add tests.

commit b60fb8743aacc865f1964b9ae55fc02bf4c676c2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:42:17 2020 +0200

    When available, use metadata['when'] as discovery date instead of the revision date.
    
    revision date is when the date was updated, not the date it was seen.

commit 821aae785f467c341bba8a2532e2def76f23bf27
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:08:43 2020 +0200

    pypi: get rid of the package name heuristic, it's unreliable.

commit e89e5502c9fcd0c36081be171c3fc790ecf81044
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 15:12:36 2020 +0200

    pypi: add support for another format

commit b23e693182b9978af171a0e843d1bc7fcf60cb6a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:54:30 2020 +0200

    pypi: improve origin detection from filename.

commit 24c40221418b6cd5a98a12febd0aefc8c0bc28de
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:49:15 2020 +0200

    cran: detect package name when 'provider' does not have the right format.

commit 3ca9e9f746014f2938b22f9121fbf8c9c0523f2e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:39:07 2020 +0200

    explicitly error if the loader type could not be detected.

commit f97af658809baa3cb786a45b43e4f3c3c957a2ab
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:38:38 2020 +0200

    pypi: detect origin from format 2, and fix format of original_artifacts.

commit 959382c9e26063ad38b7eb9d71cc48d3b6936e8e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:51:20 2020 +0200

    debian: rewrite original_artifacts to the current format.

commit dbca8e8e9e2616be3950fdf70b8290470e084e4e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:03:38 2020 +0200

    tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.

commit e15d3965e243bd3f42569225b194f6ccff68fe61
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:36:19 2020 +0200

    cran: add support for revisions with date

commit 0676fdaefc4c88bdfb99dbed5cd71c9aedf4a4e7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:43 2020 +0200

    npm: add tests

commit a0d1a281334ae3400d5d8cba4b8be4e009a0ac67
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:31 2020 +0200

    npm format 2: fix format of original_artifact.

commit ff264366ba889e325cc819462015b96fb4b4f7db
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:18:56 2020 +0200

    npm format 2: build origins urls.

commit 7a8c29f97bb7b498dda23685669e743bd31201fa
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:17:47 2020 +0200

    Rename original-artifact-json to original-artifacts-json.
    
    As in the core loader.

commit e237710b1901b189d2ee63ecd5c423ae909a8732
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:05:09 2020 +0200

    pypi: add test

commit db26770bb56d1c3769cac93af859fbf7f4d147e5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:04:53 2020 +0200

    Start adding the origin context

commit 8d087c7df42e8e4068b660173da7094b8768eb23
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:52 2020 +0200

    cran: add test

commit ada307d1114d4e4eea6a3a60bb925e098c8f4e65
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:34 2020 +0200

    cran: handle date

commit 20315d2187ca849c3a2847f269837f930f2a2c61
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 15:57:44 2020 +0200

    add tests for revisions generated by the debian loader.

commit 0da0951a0a4e57c8acae5ef5f9a8dfa30c715118
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 20 15:24:28 2020 +0200

    [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/879/ for more details.

vlorentz updated this revision to Diff 13596.Aug 26 2020, 10:59 PM
  • deposit: use the date of the deposit request for each metadata item, instead of the same date for all.

Build is green

Patch application report for D3820 (id=13596)

Rebasing onto 4532a4dc64...

First, rewinding head to replay your work on top of it...
Applying: [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.
Applying: add tests for revisions generated by the debian loader.
Applying: cran: handle date
Applying: cran: add test
Applying: Start adding the origin context
Applying: pypi: add test
Applying: Rename original-artifact-json to original-artifacts-json.
Applying: npm format 2: build origins urls.
Applying: npm format 2: fix format of original_artifact.
Applying: npm: add tests
Applying: cran: add support for revisions with date
Applying: tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.
Applying: debian: rewrite original_artifacts to the current format.
Applying: pypi: detect origin from format 2, and fix format of original_artifacts.
Applying: explicitly error if the loader type could not be detected.
Applying: cran: detect package name when 'provider' does not have the right format.
Applying: pypi: improve origin detection from filename.
Applying: pypi: add support for another format
Applying: pypi: get rid of the package name heuristic, it's unreliable.
Applying: When available, use metadata['when'] as discovery date instead of the revision date.
Applying: gnu: add tests.
Applying: nixguix: add test.
Applying: Fix crash when original_artifact is missing an url.
Applying: cran: improve package name detection.
Applying: deposit: add tests.
Applying: deposit: add support for revisions with no metadata.
Applying: deduplicate origin checking.
Applying: deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.
Applying: deposit: fix tests
Applying: deposit: Change authority type from FORGE to DEPOSIT_CLIENT.
Applying: deposit: use external_id from the HTTP header instead of the metadata.
Applying: deposit: use the date of the deposit request for each metadata item, instead of the same date for all.
Changes applied before test
commit 413f048e3dac1b5f2b9d28479f84f8b543205845
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:56:22 2020 +0200

    deposit: use the date of the deposit request for each metadata item, instead of the same date for all.
    
    it's more accurate, and allows storing all versions of the metadata, as we
    can't store metadata twice at the same date.

commit 8ab39454c6fb5653a2ef588b57a9df129f449e11
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:40:53 2020 +0200

    deposit: use external_id from the HTTP header instead of the metadata.
    
    The origin URL is created from the one in the Slug header, and the one
    in the metadata file may be wrong.

commit 0cc84112a301083650860d6a6dfa2a12202cadbe
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:39:53 2020 +0200

    deposit: Change authority type from FORGE to DEPOSIT_CLIENT.

commit 3ba94edbcef674fe07e880653676bf48b870a0af
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:27:20 2020 +0200

    deposit: fix tests

commit 5c8510ada46d7ae86846d0efce6b566de29d1380
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:04:29 2020 +0200

    deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.

commit cf02f196f1ca49f7c13e6f4061bd034772c3c0a6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:51:27 2020 +0200

    deduplicate origin checking.

commit a160aba80de5584ae92f3d79210a029cf56eba4a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:50:28 2020 +0200

    deposit: add support for revisions with no metadata.

commit f8a48e1edc9b298fc10ad8340df5fde3708069a4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:48:39 2020 +0200

    deposit: add tests.

commit d68193c61137f8293e9121a0b97121bf49d71f87
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 09:51:06 2020 +0200

    cran: improve package name detection.

commit b711891d598441e66eb0a849e5dd3b5a52a0667d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:51:15 2020 +0200

    Fix crash when original_artifact is missing an url.

commit b47282965435beaefb9b302cabb3e83127fbb8c7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:50:41 2020 +0200

    nixguix: add test.

commit 147e9e031f32ca4d4f1cbcc51e3b791c25c28109
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:43:09 2020 +0200

    gnu: add tests.

commit 14e116c3cfa1f63e40b42e7cacd527c7d24d3faf
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:42:17 2020 +0200

    When available, use metadata['when'] as discovery date instead of the revision date.
    
    revision date is when the date was updated, not the date it was seen.

commit 9e2392a2756d7b6b52cdea2e83e03e7f9aa500ef
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:08:43 2020 +0200

    pypi: get rid of the package name heuristic, it's unreliable.

commit 6269e1c2f859c5f22db31ab618902c044392762b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 15:12:36 2020 +0200

    pypi: add support for another format

commit 54eff972e219a012a8453595c8c72fa54217c42c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:54:30 2020 +0200

    pypi: improve origin detection from filename.

commit 00af96c4e5877e54dc69956992a8aa2b77ad0840
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:49:15 2020 +0200

    cran: detect package name when 'provider' does not have the right format.

commit 5651104b741b9e68a9a0f81b7548653598a28120
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:39:07 2020 +0200

    explicitly error if the loader type could not be detected.

commit 4baeb30359a3e52057981a34c16446ddc60ca1a5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:38:38 2020 +0200

    pypi: detect origin from format 2, and fix format of original_artifacts.

commit 0c924007450aebedb261cb8218881dca2ad2e70b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:51:20 2020 +0200

    debian: rewrite original_artifacts to the current format.

commit a1ef1f761348de743a22441dc8dd561fcdffdeb7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:03:38 2020 +0200

    tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.

commit a0e55eba373a6702a2ca61f4bea281422150ccd7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:36:19 2020 +0200

    cran: add support for revisions with date

commit d1c99e4fbf6764825a552da4994eaaa91d877b7b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:43 2020 +0200

    npm: add tests

commit 0301249ff4a7a84421c7065e343b081748450e98
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:31 2020 +0200

    npm format 2: fix format of original_artifact.

commit 26e4868342140a8a3892ef5f3e72b28aea320404
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:18:56 2020 +0200

    npm format 2: build origins urls.

commit 4cd2668efb97d237a4f221a925b88a9173ab4b84
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:17:47 2020 +0200

    Rename original-artifact-json to original-artifacts-json.
    
    As in the core loader.

commit e8e56de733712b2d870f6d34cafb7c9c39b70d97
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:05:09 2020 +0200

    pypi: add test

commit 2666c703caddd9b488ecc0467774ce7787f6c07d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:04:53 2020 +0200

    Start adding the origin context

commit 8dc8f57287cfbd64bc31acde420fee1d81fa3332
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:52 2020 +0200

    cran: add test

commit 6232b3378780f90b85c45e0753b883dd1cc6c1e8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:34 2020 +0200

    cran: handle date

commit 6603dcf5b8d479bf38c5d6964310c7ebe850c6f7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 15:57:44 2020 +0200

    add tests for revisions generated by the debian loader.

commit b5d501b137c858e1b9c21d6d31565a3475c72bc2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 20 15:24:28 2020 +0200

    [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/880/ for more details.

vlorentz updated this revision to Diff 13605.Aug 27 2020, 11:28 PM
  • npm: unescape package names.
  • deposit: add another exception
  • deposit: add another exception

Build is green

Patch application report for D3820 (id=13605)

Rebasing onto 4532a4dc64...

First, rewinding head to replay your work on top of it...
Applying: [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.
Applying: add tests for revisions generated by the debian loader.
Applying: cran: handle date
Applying: cran: add test
Applying: Start adding the origin context
Applying: pypi: add test
Applying: Rename original-artifact-json to original-artifacts-json.
Applying: npm format 2: build origins urls.
Applying: npm format 2: fix format of original_artifact.
Applying: npm: add tests
Applying: cran: add support for revisions with date
Applying: tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.
Applying: debian: rewrite original_artifacts to the current format.
Applying: pypi: detect origin from format 2, and fix format of original_artifacts.
Applying: explicitly error if the loader type could not be detected.
Applying: cran: detect package name when 'provider' does not have the right format.
Applying: pypi: improve origin detection from filename.
Applying: pypi: add support for another format
Applying: pypi: get rid of the package name heuristic, it's unreliable.
Applying: When available, use metadata['when'] as discovery date instead of the revision date.
Applying: gnu: add tests.
Applying: nixguix: add test.
Applying: Fix crash when original_artifact is missing an url.
Applying: cran: improve package name detection.
Applying: deposit: add tests.
Applying: deposit: add support for revisions with no metadata.
Applying: deduplicate origin checking.
Applying: deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.
Applying: deposit: fix tests
Applying: deposit: Change authority type from FORGE to DEPOSIT_CLIENT.
Applying: deposit: use external_id from the HTTP header instead of the metadata.
Applying: deposit: use the date of the deposit request for each metadata item, instead of the same date for all.
Applying: npm: unescape package names.
Applying: deposit: add another exception
Applying: deposit: add another exception
Changes applied before test
commit 26f10077e928398cbfceca5b92083ccecc1a8061
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 23:28:07 2020 +0200

    deposit: add another exception

commit 5165168db2b726ea0e3e4c719e6ecb524e6982cb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 23:27:27 2020 +0200

    deposit: add another exception

commit 7e68a00844a0560cb0e5784c93688803bee14730
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 10:27:06 2020 +0200

    npm: unescape package names.

commit ea317dfefeb84f90dae308a3689adf02511e87fa
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:56:22 2020 +0200

    deposit: use the date of the deposit request for each metadata item, instead of the same date for all.
    
    it's more accurate, and allows storing all versions of the metadata, as we
    can't store metadata twice at the same date.

commit 50e2f26cbc1694864f55a882e7a7d23547815189
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:40:53 2020 +0200

    deposit: use external_id from the HTTP header instead of the metadata.
    
    The origin URL is created from the one in the Slug header, and the one
    in the metadata file may be wrong.

commit bff6bfdbcfeb0eb2c331a2e7c2160829248b747c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:39:53 2020 +0200

    deposit: Change authority type from FORGE to DEPOSIT_CLIENT.

commit 327b4468d79072619056aa469e55a122ed9e3f85
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:27:20 2020 +0200

    deposit: fix tests

commit 7206dd570ece822d4bc6f85aef613b3f24450131
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:04:29 2020 +0200

    deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.

commit 80ea3158f0b35578f1fc7de192278e42d3a9a83e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:51:27 2020 +0200

    deduplicate origin checking.

commit c725c5996dd7e80be77148d0e74986a4ba41b263
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:50:28 2020 +0200

    deposit: add support for revisions with no metadata.

commit 0164dc100f688862a217c5bf9b223ad9ecc5978c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:48:39 2020 +0200

    deposit: add tests.

commit 9b9a4e8a56133bba5a8e039e904b81ea103eed80
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 09:51:06 2020 +0200

    cran: improve package name detection.

commit e899acb9ff7feddb797294fdd1fba6d8e028a8b0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:51:15 2020 +0200

    Fix crash when original_artifact is missing an url.

commit f20de267237921cfb638e1d5ce800f8071a57344
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:50:41 2020 +0200

    nixguix: add test.

commit d85536e4bec89e184a9d0dd3f4fec217664f288d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:43:09 2020 +0200

    gnu: add tests.

commit 93f41bb5849dff6a6e8afdaeb8fe0e67c2f3d6db
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:42:17 2020 +0200

    When available, use metadata['when'] as discovery date instead of the revision date.
    
    revision date is when the date was updated, not the date it was seen.

commit 0777a956734d1f51168f7c7c9b2211e27194a678
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:08:43 2020 +0200

    pypi: get rid of the package name heuristic, it's unreliable.

commit 6a611a09a6f991f023411a7b4580437cec7d4d4b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 15:12:36 2020 +0200

    pypi: add support for another format

commit d1dc3634e9d49816fbb8e900051b79fc6972d595
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:54:30 2020 +0200

    pypi: improve origin detection from filename.

commit e3ac66b8782f213dee2f60653446c41113b15088
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:49:15 2020 +0200

    cran: detect package name when 'provider' does not have the right format.

commit a64e568b1891220b65b5f3a89bfd3504db9f1190
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:39:07 2020 +0200

    explicitly error if the loader type could not be detected.

commit d707a061bffbe5d970735892a7e5127f256a2d05
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:38:38 2020 +0200

    pypi: detect origin from format 2, and fix format of original_artifacts.

commit 0283ae750c50d160fe43b5cb7f856ae0810becb1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:51:20 2020 +0200

    debian: rewrite original_artifacts to the current format.

commit 48260accd70b1bb021e3f9027efdc7d934562559
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:03:38 2020 +0200

    tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.

commit d9b50467b987836cc9683549d1e6af58da166226
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:36:19 2020 +0200

    cran: add support for revisions with date

commit f2a89827173b764915fc2f198aaa62797000e7c0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:43 2020 +0200

    npm: add tests

commit a8cc512730a8d7ea76a30adb3ddd4e1d849f67e1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:31 2020 +0200

    npm format 2: fix format of original_artifact.

commit ab43e23753cff419444a870b7f8be9afcb27e0f4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:18:56 2020 +0200

    npm format 2: build origins urls.

commit 084eb5b1d04f8006fcda2b98f5b56b276010971b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:17:47 2020 +0200

    Rename original-artifact-json to original-artifacts-json.
    
    As in the core loader.

commit 9526aebacfcd5654d60ee3068c34290a778b3d17
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:05:09 2020 +0200

    pypi: add test

commit c6dce5d11535e1182e02b87610fe42fa51e80efa
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:04:53 2020 +0200

    Start adding the origin context

commit 7599d241352f9221b016f2f9260efa60f0910f0d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:52 2020 +0200

    cran: add test

commit 66a806e589761ccd4d36dabeb852b10d440c1ff8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:34 2020 +0200

    cran: handle date

commit f5fdf27cc6d806551937eddd02ac1cc7a7d8eaa1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 15:57:44 2020 +0200

    add tests for revisions generated by the debian loader.

commit bb89e6988e861cc0e0667cbc55669b810e4067f8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 20 15:24:28 2020 +0200

    [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/881/ for more details.

vlorentz updated this revision to Diff 13615.Sep 1 2020, 10:40 AM
  • remove prints
  • add origin deposit exceptions.
vlorentz updated this revision to Diff 13616.Sep 1 2020, 10:42 AM
  • add an origin cache, to spare a request to storage.origin.get for each origin.

Build is green

Patch application report for D3820 (id=13615)

Rebasing onto 5afd985ebd...

First, rewinding head to replay your work on top of it...
Applying: [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.
Applying: add tests for revisions generated by the debian loader.
Applying: cran: handle date
Applying: cran: add test
Applying: Start adding the origin context
Applying: pypi: add test
Applying: Rename original-artifact-json to original-artifacts-json.
Applying: npm format 2: build origins urls.
Applying: npm format 2: fix format of original_artifact.
Applying: npm: add tests
Applying: cran: add support for revisions with date
Applying: tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.
Applying: debian: rewrite original_artifacts to the current format.
Applying: pypi: detect origin from format 2, and fix format of original_artifacts.
Applying: explicitly error if the loader type could not be detected.
Applying: cran: detect package name when 'provider' does not have the right format.
Applying: pypi: improve origin detection from filename.
Applying: pypi: add support for another format
Applying: pypi: get rid of the package name heuristic, it's unreliable.
Applying: When available, use metadata['when'] as discovery date instead of the revision date.
Applying: gnu: add tests.
Applying: nixguix: add test.
Applying: Fix crash when original_artifact is missing an url.
Applying: cran: improve package name detection.
Applying: deposit: add tests.
Applying: deposit: add support for revisions with no metadata.
Applying: deduplicate origin checking.
Applying: deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.
Applying: deposit: fix tests
Applying: deposit: Change authority type from FORGE to DEPOSIT_CLIENT.
Applying: deposit: use external_id from the HTTP header instead of the metadata.
Applying: deposit: use the date of the deposit request for each metadata item, instead of the same date for all.
Applying: npm: unescape package names.
Applying: deposit: add another exception
Applying: deposit: add another exception
Applying: remove prints
Applying: add origin deposit exceptions.
Changes applied before test
commit c2bfc8ed2d5a1d8db45a314f9124a70881ec9e00
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:39:55 2020 +0200

    add origin deposit exceptions.

commit bcc561fb518edcffe76ecd1ace1f1385afa0d47c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:39:39 2020 +0200

    remove prints

commit 00e7d70beb2767d6d841d33f29b6feb12ebebc34
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 23:28:07 2020 +0200

    deposit: add another exception

commit 51645d6fa85b4180ca6898f1888de1c51162fb53
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 23:27:27 2020 +0200

    deposit: add another exception

commit 9b05e72d0d9769f20ca9dca0d6b58f126abea87e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 10:27:06 2020 +0200

    npm: unescape package names.

commit 0f4c1ad25bbd8fbe2460fcf4d94adad108282784
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:56:22 2020 +0200

    deposit: use the date of the deposit request for each metadata item, instead of the same date for all.
    
    it's more accurate, and allows storing all versions of the metadata, as we
    can't store metadata twice at the same date.

commit d87af304dcbec94649fa381a2321e43aba2eb26a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:40:53 2020 +0200

    deposit: use external_id from the HTTP header instead of the metadata.
    
    The origin URL is created from the one in the Slug header, and the one
    in the metadata file may be wrong.

commit 95380993303f4066e0812e01b15e96ef39e8c248
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:39:53 2020 +0200

    deposit: Change authority type from FORGE to DEPOSIT_CLIENT.

commit 4b0a82c6edec4eb331b217b976b4f316157e7ac6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:27:20 2020 +0200

    deposit: fix tests

commit 424d52b6d4d54a1df80aac111ce0b1ca035701a7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:04:29 2020 +0200

    deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.

commit 4d5c9d8b9c6db255268f2e88fb3d6b20692dd0c3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:51:27 2020 +0200

    deduplicate origin checking.

commit 66a2af1ad76fa9856af1467a52093b7268e63f4d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:50:28 2020 +0200

    deposit: add support for revisions with no metadata.

commit cc3ea45e3fcb5287d0faf07f21443787eaba9722
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:48:39 2020 +0200

    deposit: add tests.

commit 6b851c123ad80cdf98712bba0803f5d95282b247
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 09:51:06 2020 +0200

    cran: improve package name detection.

commit 04343b10b0907caf2c1856f7501503e5c3f82baa
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:51:15 2020 +0200

    Fix crash when original_artifact is missing an url.

commit 80fa3bdc433f18151d34e7ef454b5f83ed5075ce
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:50:41 2020 +0200

    nixguix: add test.

commit cbc03206a02cbe53e32ed057cc25f29e8b4bd21b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:43:09 2020 +0200

    gnu: add tests.

commit 226ae66f097100b93997f5e399b81381f8011015
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:42:17 2020 +0200

    When available, use metadata['when'] as discovery date instead of the revision date.
    
    revision date is when the date was updated, not the date it was seen.

commit b72ad4bdcb379e6e340ba48c194643c89c8140d0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:08:43 2020 +0200

    pypi: get rid of the package name heuristic, it's unreliable.

commit 45edf546f6283c11b514922ff3e118c9e1fb2399
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 15:12:36 2020 +0200

    pypi: add support for another format

commit 79e67783ef91e310d55a94d866f7af5681832f31
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:54:30 2020 +0200

    pypi: improve origin detection from filename.

commit b207c0545aadb6150bb9bbdd07f4ad6572be01bd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:49:15 2020 +0200

    cran: detect package name when 'provider' does not have the right format.

commit 2ff3020cafb68ae7a767f8e1e840113a597fade3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:39:07 2020 +0200

    explicitly error if the loader type could not be detected.

commit e4bb63f6a5b2803193d529e80942fe4ef44d3412
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:38:38 2020 +0200

    pypi: detect origin from format 2, and fix format of original_artifacts.

commit f7a735aea440585c07e5d0eddfe33994af1bac22
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:51:20 2020 +0200

    debian: rewrite original_artifacts to the current format.

commit c21768769259c1b95c82f33de52af83908b43483
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:03:38 2020 +0200

    tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.

commit 6764580a4bdff28cdc4f0f5cc527b35361994946
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:36:19 2020 +0200

    cran: add support for revisions with date

commit 17d591161cd0744d8a65aa340e92a120cc1b6a67
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:43 2020 +0200

    npm: add tests

commit 0fd21ac83803e1110ed4bcb77061fece91125f98
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:31 2020 +0200

    npm format 2: fix format of original_artifact.

commit 84f3616abe49b5603763f924e5afc93cfee5d09f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:18:56 2020 +0200

    npm format 2: build origins urls.

commit 1a385c8ed84ee173240c2c098892b604583e3f74
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:17:47 2020 +0200

    Rename original-artifact-json to original-artifacts-json.
    
    As in the core loader.

commit 7d55161a83dabb04553822e87a82e57275ef799b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:05:09 2020 +0200

    pypi: add test

commit f515aa3cd71f9a2df7bd052df7f2e08ce207df3f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:04:53 2020 +0200

    Start adding the origin context

commit 8b87d228115827abe7fec48bf58900b3c4739077
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:52 2020 +0200

    cran: add test

commit 359f80def36737794d2a28526b0e8cb456f7ef8a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:34 2020 +0200

    cran: handle date

commit 909277e1b8575f095f0d39888fc3aaaf34dead99
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 15:57:44 2020 +0200

    add tests for revisions generated by the debian loader.

commit 47da405cc72723e7a2932207019328d89ad319e1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 20 15:24:28 2020 +0200

    [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/885/ for more details.

Build is green

Patch application report for D3820 (id=13616)

Rebasing onto 5afd985ebd...

First, rewinding head to replay your work on top of it...
Applying: [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.
Applying: add tests for revisions generated by the debian loader.
Applying: cran: handle date
Applying: cran: add test
Applying: Start adding the origin context
Applying: pypi: add test
Applying: Rename original-artifact-json to original-artifacts-json.
Applying: npm format 2: build origins urls.
Applying: npm format 2: fix format of original_artifact.
Applying: npm: add tests
Applying: cran: add support for revisions with date
Applying: tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.
Applying: debian: rewrite original_artifacts to the current format.
Applying: pypi: detect origin from format 2, and fix format of original_artifacts.
Applying: explicitly error if the loader type could not be detected.
Applying: cran: detect package name when 'provider' does not have the right format.
Applying: pypi: improve origin detection from filename.
Applying: pypi: add support for another format
Applying: pypi: get rid of the package name heuristic, it's unreliable.
Applying: When available, use metadata['when'] as discovery date instead of the revision date.
Applying: gnu: add tests.
Applying: nixguix: add test.
Applying: Fix crash when original_artifact is missing an url.
Applying: cran: improve package name detection.
Applying: deposit: add tests.
Applying: deposit: add support for revisions with no metadata.
Applying: deduplicate origin checking.
Applying: deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.
Applying: deposit: fix tests
Applying: deposit: Change authority type from FORGE to DEPOSIT_CLIENT.
Applying: deposit: use external_id from the HTTP header instead of the metadata.
Applying: deposit: use the date of the deposit request for each metadata item, instead of the same date for all.
Applying: npm: unescape package names.
Applying: deposit: add another exception
Applying: deposit: add another exception
Applying: remove prints
Applying: add origin deposit exceptions.
Applying: add an origin cache, to spare a request to storage.origin.get for each origin.
Changes applied before test
commit 470bfbd847b46911b6504b1c71dec3dfd6d3929d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:42:09 2020 +0200

    add an origin cache, to spare a request to storage.origin.get for each origin.

commit e01486986bedab455cc6c36df7e745e407df7715
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:39:55 2020 +0200

    add origin deposit exceptions.

commit ab3edc7152e116e1beb2732e7719bf0302c87cb0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:39:39 2020 +0200

    remove prints

commit 8eb48668dcb9ccf430ad5e54e553af7a66148a86
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 23:28:07 2020 +0200

    deposit: add another exception

commit 1b022f171db143e28b4d02f83f6d3c364895a934
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 23:27:27 2020 +0200

    deposit: add another exception

commit c296fd47097458b80c08e3b321b489396909c792
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 10:27:06 2020 +0200

    npm: unescape package names.

commit 0f60904fab20b6dc74c2112e206c9e1e6226143d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:56:22 2020 +0200

    deposit: use the date of the deposit request for each metadata item, instead of the same date for all.
    
    it's more accurate, and allows storing all versions of the metadata, as we
    can't store metadata twice at the same date.

commit c9e22d0a8bf146784abdcea13a2b331b164f15f4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:40:53 2020 +0200

    deposit: use external_id from the HTTP header instead of the metadata.
    
    The origin URL is created from the one in the Slug header, and the one
    in the metadata file may be wrong.

commit dd6be8829d040551822e81bfd55f8d1c143770d7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:39:53 2020 +0200

    deposit: Change authority type from FORGE to DEPOSIT_CLIENT.

commit 2aeeafd377a80024ee37f4d91494f8e0ee1cb1bf
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:27:20 2020 +0200

    deposit: fix tests

commit 400c67b7fbf183a410318991b2f3893e0592657e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:04:29 2020 +0200

    deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.

commit 8b2314fb110f70845cc332a77564ad1b7c33691e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:51:27 2020 +0200

    deduplicate origin checking.

commit ceb38bdc9d3672ff3b59fd3dce5dd2d676687e29
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:50:28 2020 +0200

    deposit: add support for revisions with no metadata.

commit f79e0718ebc24d3937f6acd77b79b2fedd369760
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:48:39 2020 +0200

    deposit: add tests.

commit 4bda7bed4fed2810dfce7bf6baa4ca98ea12d30a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 09:51:06 2020 +0200

    cran: improve package name detection.

commit ed13769b8397638b3b6bf54a298c396e12e455cb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:51:15 2020 +0200

    Fix crash when original_artifact is missing an url.

commit 77602366634fff5808cfa8f0bd2f51d682afbea3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:50:41 2020 +0200

    nixguix: add test.

commit 6341f2c714a3eb875e8aafef7ea28faba2b91a1d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:43:09 2020 +0200

    gnu: add tests.

commit e3e06dbe1c3c9bd83aae192d9e1808b354b819c7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:42:17 2020 +0200

    When available, use metadata['when'] as discovery date instead of the revision date.
    
    revision date is when the date was updated, not the date it was seen.

commit 41463f4f1df50c39ce0d49fcc9266cd667453c64
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:08:43 2020 +0200

    pypi: get rid of the package name heuristic, it's unreliable.

commit a9967aa0ed355aaa6ada041025ae3a3aa11fb85a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 15:12:36 2020 +0200

    pypi: add support for another format

commit 9f7dad9318c7df351aa593bd13745a548e31addf
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:54:30 2020 +0200

    pypi: improve origin detection from filename.

commit 85177f93592fc06463f6af08f84fddbb67b90d94
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:49:15 2020 +0200

    cran: detect package name when 'provider' does not have the right format.

commit 7607cf03e8bf035cc154b3df27a9537d4a355197
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:39:07 2020 +0200

    explicitly error if the loader type could not be detected.

commit b5b9b186c55084ea621aa70d9364d2f930b867f7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:38:38 2020 +0200

    pypi: detect origin from format 2, and fix format of original_artifacts.

commit 5eacb07bf52432b54ccad8dd1fc371ab3080b934
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:51:20 2020 +0200

    debian: rewrite original_artifacts to the current format.

commit ea16267e6ed4fcde15107a3e99d803dfdbd5066e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:03:38 2020 +0200

    tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.

commit 0e957d5e1055ea1f9b54ef56aec61eaace895de2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:36:19 2020 +0200

    cran: add support for revisions with date

commit c19034bd49409bb2d47b521e2b636014c24b1421
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:43 2020 +0200

    npm: add tests

commit 59cc6425c2d68b36086b953f0fbb9d46449f2ba0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:31 2020 +0200

    npm format 2: fix format of original_artifact.

commit 7e4d4dae7a8e986ce7dacc4533bf4bc39e29330a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:18:56 2020 +0200

    npm format 2: build origins urls.

commit 30c886361db9f64c6eaf90d296a6e37153953839
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:17:47 2020 +0200

    Rename original-artifact-json to original-artifacts-json.
    
    As in the core loader.

commit 0f6e5e3fa9292b24faf0c2869848fde72b7d0623
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:05:09 2020 +0200

    pypi: add test

commit 3268c6e33a6970cd03db06891dfc6273948416e6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:04:53 2020 +0200

    Start adding the origin context

commit 7f4a6c60a7e1fc509c6c8db63cc15dffd3693a4b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:52 2020 +0200

    cran: add test

commit 214deb6c273421de47610bff93f0ae6797cd9921
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:34 2020 +0200

    cran: handle date

commit 570777edcf004c297dff272e141875c326c5f88e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 15:57:44 2020 +0200

    add tests for revisions generated by the debian loader.

commit 81e05e9800b4aa8b69eaccd5adce5bf4224086bd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 20 15:24:28 2020 +0200

    [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/886/ for more details.

vlorentz updated this revision to Diff 13618.Sep 1 2020, 11:35 AM
  • add test for deposit format 1.

Build is green

Patch application report for D3820 (id=13618)

Rebasing onto 5afd985ebd...

First, rewinding head to replay your work on top of it...
Applying: [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.
Applying: add tests for revisions generated by the debian loader.
Applying: cran: handle date
Applying: cran: add test
Applying: Start adding the origin context
Applying: pypi: add test
Applying: Rename original-artifact-json to original-artifacts-json.
Applying: npm format 2: build origins urls.
Applying: npm format 2: fix format of original_artifact.
Applying: npm: add tests
Applying: cran: add support for revisions with date
Applying: tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.
Applying: debian: rewrite original_artifacts to the current format.
Applying: pypi: detect origin from format 2, and fix format of original_artifacts.
Applying: explicitly error if the loader type could not be detected.
Applying: cran: detect package name when 'provider' does not have the right format.
Applying: pypi: improve origin detection from filename.
Applying: pypi: add support for another format
Applying: pypi: get rid of the package name heuristic, it's unreliable.
Applying: When available, use metadata['when'] as discovery date instead of the revision date.
Applying: gnu: add tests.
Applying: nixguix: add test.
Applying: Fix crash when original_artifact is missing an url.
Applying: cran: improve package name detection.
Applying: deposit: add tests.
Applying: deposit: add support for revisions with no metadata.
Applying: deduplicate origin checking.
Applying: deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.
Applying: deposit: fix tests
Applying: deposit: Change authority type from FORGE to DEPOSIT_CLIENT.
Applying: deposit: use external_id from the HTTP header instead of the metadata.
Applying: deposit: use the date of the deposit request for each metadata item, instead of the same date for all.
Applying: npm: unescape package names.
Applying: deposit: add another exception
Applying: deposit: add another exception
Applying: remove prints
Applying: add origin deposit exceptions.
Applying: add an origin cache, to spare a request to storage.origin.get for each origin.
Applying: add test for deposit format 1.
Changes applied before test
commit 396e47d7df547ff05b3b8f75a55d03be32faab93
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 11:34:28 2020 +0200

    add test for deposit format 1.

commit fb5191fdaec91f658073c39153d9c4de510048ea
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:42:09 2020 +0200

    add an origin cache, to spare a request to storage.origin.get for each origin.

commit c02f8d03bf3adc200806d973a4635df6579d3133
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:39:55 2020 +0200

    add origin deposit exceptions.

commit ccee5a725348a0bce0d56ccd4bdac148977a8f72
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:39:39 2020 +0200

    remove prints

commit b3e10db58457202ce4225fd01006101093bc9647
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 23:28:07 2020 +0200

    deposit: add another exception

commit ab3ccaf503d5bf03c39b00016bbad87371804880
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 23:27:27 2020 +0200

    deposit: add another exception

commit 5216a0cca930e0ac34116633a2be30634ef5a4f8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 10:27:06 2020 +0200

    npm: unescape package names.

commit 882654e7ae4b75ab113cc9e221e251dc414970a7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:56:22 2020 +0200

    deposit: use the date of the deposit request for each metadata item, instead of the same date for all.
    
    it's more accurate, and allows storing all versions of the metadata, as we
    can't store metadata twice at the same date.

commit 9a87f24586935a64aca62223b9aa6faef8234d10
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:40:53 2020 +0200

    deposit: use external_id from the HTTP header instead of the metadata.
    
    The origin URL is created from the one in the Slug header, and the one
    in the metadata file may be wrong.

commit f0730b7ee12bbc4087c8f3e99f33ad664cc0cc27
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:39:53 2020 +0200

    deposit: Change authority type from FORGE to DEPOSIT_CLIENT.

commit 025bddf7e88b3a27507830b6bf8075872b7d6612
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:27:20 2020 +0200

    deposit: fix tests

commit aa22247333e848f176c6c2120ffce29b74c93fe0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:04:29 2020 +0200

    deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.

commit 4e1f8fd976f5264bfa2d86331fd05e257b286f3d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:51:27 2020 +0200

    deduplicate origin checking.

commit e470be4fcefe56c1e2e4984f55c269d7baa7743c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:50:28 2020 +0200

    deposit: add support for revisions with no metadata.

commit a0a4bfbaf021048a803c89e5fc82b19f90106ea5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:48:39 2020 +0200

    deposit: add tests.

commit fadb82e1c33ee0c0546867ffe6bcef9406ebb444
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 09:51:06 2020 +0200

    cran: improve package name detection.

commit 71dbf3a8da200e92b41e5a7fc1d24745df186964
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:51:15 2020 +0200

    Fix crash when original_artifact is missing an url.

commit 7de620e1472a067ba3e5d73f3bc2878c57d50836
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:50:41 2020 +0200

    nixguix: add test.

commit e5571c08e5d870218e10fdb70e88571c8b49565d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:43:09 2020 +0200

    gnu: add tests.

commit 5926f7a8246fe6b431116e3b3783aa6840197da0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:42:17 2020 +0200

    When available, use metadata['when'] as discovery date instead of the revision date.
    
    revision date is when the date was updated, not the date it was seen.

commit b5f00b34a35e67ae50a199f7aad48701946351d3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:08:43 2020 +0200

    pypi: get rid of the package name heuristic, it's unreliable.

commit 0fa334b31a093577710a1cd0b9dd65fd6a6a7754
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 15:12:36 2020 +0200

    pypi: add support for another format

commit a41e681cc4644ae662d486d5b7db421100cdf7ff
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:54:30 2020 +0200

    pypi: improve origin detection from filename.

commit 78e089e7df5133198a7c1d638747a412f93dc2a1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:49:15 2020 +0200

    cran: detect package name when 'provider' does not have the right format.

commit 2b80521c9bd145c6f1ea1e7007c504f90a51abbf
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:39:07 2020 +0200

    explicitly error if the loader type could not be detected.

commit ea80119983da49042b4cab9e7f2532df3ecb2be2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:38:38 2020 +0200

    pypi: detect origin from format 2, and fix format of original_artifacts.

commit b3e4eb222b988a64c20bfa12eda95db139fb399c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:51:20 2020 +0200

    debian: rewrite original_artifacts to the current format.

commit 531babe765b03a9b4f5f8ae9c007e6ea5b6656a5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:03:38 2020 +0200

    tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.

commit 702fa93a3669c60351e20f0d6cd96200c68a3a26
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:36:19 2020 +0200

    cran: add support for revisions with date

commit cf519159fe69b8902a7e57714cbdf75a40eb4225
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:43 2020 +0200

    npm: add tests

commit 843cced9e4bc63e61469b1bad54e78bd1d1c93eb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:31 2020 +0200

    npm format 2: fix format of original_artifact.

commit 5264892dcd08a768726e5c3b9390f934dfddc0cf
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:18:56 2020 +0200

    npm format 2: build origins urls.

commit e8bd4e508027ddbd1051e87abe1773f52a9aa1f0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:17:47 2020 +0200

    Rename original-artifact-json to original-artifacts-json.
    
    As in the core loader.

commit 491a86df64486039b298961e32f422c09c68ca3e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:05:09 2020 +0200

    pypi: add test

commit eec8122b234f8e42252fa96c1bec0cfaf2ef522a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:04:53 2020 +0200

    Start adding the origin context

commit 658e79c3fa6ee6d7c01e348ae9f545543178f8d5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:52 2020 +0200

    cran: add test

commit 700c56e6232b6f183bbea2aa225f750645fa0674
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:34 2020 +0200

    cran: handle date

commit 0d7e24033079f2c946139bb62b7795e4d9cee478
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 15:57:44 2020 +0200

    add tests for revisions generated by the debian loader.

commit c00e7ac782e8525e9388ca97695771376d129d43
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 20 15:24:28 2020 +0200

    [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/887/ for more details.

vlorentz updated this revision to Diff 13620.Sep 1 2020, 12:57 PM
  • deposit tests: add docstrings
  • deposit: shorten test data

Build is green

Patch application report for D3820 (id=13620)

Rebasing onto e6c17f6ed5...

First, rewinding head to replay your work on top of it...
Applying: [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.
Applying: add tests for revisions generated by the debian loader.
Applying: cran: handle date
Applying: cran: add test
Applying: Start adding the origin context
Applying: pypi: add test
Applying: Rename original-artifact-json to original-artifacts-json.
Applying: npm format 2: build origins urls.
Applying: npm format 2: fix format of original_artifact.
Applying: npm: add tests
Applying: cran: add support for revisions with date
Applying: tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.
Applying: debian: rewrite original_artifacts to the current format.
Applying: pypi: detect origin from format 2, and fix format of original_artifacts.
Applying: explicitly error if the loader type could not be detected.
Applying: cran: detect package name when 'provider' does not have the right format.
Applying: pypi: improve origin detection from filename.
Applying: pypi: add support for another format
Applying: pypi: get rid of the package name heuristic, it's unreliable.
Applying: When available, use metadata['when'] as discovery date instead of the revision date.
Applying: gnu: add tests.
Applying: nixguix: add test.
Applying: Fix crash when original_artifact is missing an url.
Applying: cran: improve package name detection.
Applying: deposit: add tests.
Applying: deposit: add support for revisions with no metadata.
Applying: deduplicate origin checking.
Applying: deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.
Applying: deposit: fix tests
Applying: deposit: Change authority type from FORGE to DEPOSIT_CLIENT.
Applying: deposit: use external_id from the HTTP header instead of the metadata.
Applying: deposit: use the date of the deposit request for each metadata item, instead of the same date for all.
Applying: npm: unescape package names.
Applying: deposit: add another exception
Applying: deposit: add another exception
Applying: remove prints
Applying: add origin deposit exceptions.
Applying: add an origin cache, to spare a request to storage.origin.get for each origin.
Applying: add test for deposit format 1.
Applying: deposit tests: add docstrings
Applying: deposit: shorten test data
Changes applied before test
commit 565fcba6d2ec4fa589a59d90f27f36615cf65cf7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 12:57:14 2020 +0200

    deposit: shorten test data

commit 3bd0ca9312b9bb83fe296cf873c6897e724de193
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 12:53:44 2020 +0200

    deposit tests: add docstrings

commit 891f9485a1ab1bb02a781949a85791737f42213a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 11:34:28 2020 +0200

    add test for deposit format 1.

commit 517b1bc8544f4ea6950e43b320f4e854de054bca
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:42:09 2020 +0200

    add an origin cache, to spare a request to storage.origin.get for each origin.

commit 761b3d91a4961c411f786373363bc76fae7a3046
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:39:55 2020 +0200

    add origin deposit exceptions.

commit 60740de1bfcdd587246fc0cfb32b6e67747b04c3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:39:39 2020 +0200

    remove prints

commit 88466092a0e9cb877592fa4eaef043f6aff7854a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 23:28:07 2020 +0200

    deposit: add another exception

commit e8a09a090c1c2cf78706da13103741f6baad9ae5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 23:27:27 2020 +0200

    deposit: add another exception

commit 8b876d49deeade7ab93e884cc612dcf115d47921
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 10:27:06 2020 +0200

    npm: unescape package names.

commit 630c78a038478ae818be63bdf29ad42f208aac96
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:56:22 2020 +0200

    deposit: use the date of the deposit request for each metadata item, instead of the same date for all.
    
    it's more accurate, and allows storing all versions of the metadata, as we
    can't store metadata twice at the same date.

commit 9bce25f4051d6e6cff7f35aa5c310c0e819cb12b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:40:53 2020 +0200

    deposit: use external_id from the HTTP header instead of the metadata.
    
    The origin URL is created from the one in the Slug header, and the one
    in the metadata file may be wrong.

commit 15f9a80eaa94dc8fe5454c718bdeb19438ca5385
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:39:53 2020 +0200

    deposit: Change authority type from FORGE to DEPOSIT_CLIENT.

commit 0438d053abcf293de3208107ace8845e84626ace
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:27:20 2020 +0200

    deposit: fix tests

commit ea22c0c3a98c6eaa9ad171fdccf33e434067a9c5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:04:29 2020 +0200

    deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.

commit fbd999ec7fdfa693c2a6aac90152e0dc129ded3d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:51:27 2020 +0200

    deduplicate origin checking.

commit dfee49c056f72794d6c920717b7153cc74b7bf7a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:50:28 2020 +0200

    deposit: add support for revisions with no metadata.

commit f4537fd000e711c0a3ca7a454d9465ddf74854a4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:48:39 2020 +0200

    deposit: add tests.

commit ba8fde50984b058b9171fb4dbd9f776f717fb032
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 09:51:06 2020 +0200

    cran: improve package name detection.

commit 1bb8944a8131e2d723e1e15e6a3d378d3da8bad4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:51:15 2020 +0200

    Fix crash when original_artifact is missing an url.

commit cf1528fd0580681bfd77cb1f8c4013981b2ca01a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:50:41 2020 +0200

    nixguix: add test.

commit 8430336ae134f77480bf02fd9f1d28327d2874d2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:43:09 2020 +0200

    gnu: add tests.

commit 34c5fcb9463c640cceef11b211b3d6d8701c342e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:42:17 2020 +0200

    When available, use metadata['when'] as discovery date instead of the revision date.
    
    revision date is when the date was updated, not the date it was seen.

commit c0858af46433fe0fc268b30d844ecd3e1e20eb0a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:08:43 2020 +0200

    pypi: get rid of the package name heuristic, it's unreliable.

commit d4ec1ede25f6f821a9182b65fc35c0c9e6af6211
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 15:12:36 2020 +0200

    pypi: add support for another format

commit 605fbbf409ac656a6ba03857c483ad378bc42e58
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:54:30 2020 +0200

    pypi: improve origin detection from filename.

commit 658b07768bcb0282051d8ce727f921c68e41a0ef
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:49:15 2020 +0200

    cran: detect package name when 'provider' does not have the right format.

commit e7f806aa365b99d7819affe3cb7e48ade074941b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:39:07 2020 +0200

    explicitly error if the loader type could not be detected.

commit e90567c61e10f387cc1386a02363fbd55096948e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:38:38 2020 +0200

    pypi: detect origin from format 2, and fix format of original_artifacts.

commit 01169e9ca1cecb5617b9ede61668fc462f3d5e1b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:51:20 2020 +0200

    debian: rewrite original_artifacts to the current format.

commit aed978e0d6004c2b8068d95325883dc9ca1ae4fe
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:03:38 2020 +0200

    tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.

commit 8581571eab1979b1763d6ff7bd6c7ab32993f649
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:36:19 2020 +0200

    cran: add support for revisions with date

commit e758874b6c737117fc6d3012916016f8e9cd7059
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:43 2020 +0200

    npm: add tests

commit 83f4a226b90d030f741043f4e520b7ba0d0d10e1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:31 2020 +0200

    npm format 2: fix format of original_artifact.

commit b4288ba8a194717fff701b844a5890ba06d70c02
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:18:56 2020 +0200

    npm format 2: build origins urls.

commit 4a9fb6411fd94882f6fdbc5e51de5a622b24a08a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:17:47 2020 +0200

    Rename original-artifact-json to original-artifacts-json.
    
    As in the core loader.

commit 5f4a610f3395940f2a423cf649782e962087ea54
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:05:09 2020 +0200

    pypi: add test

commit 8423dae975317abca037c39af7e50700ea6a8577
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:04:53 2020 +0200

    Start adding the origin context

commit 3c6a11b5baa36a66d5cb94ed8bc4bda6b6283fb4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:52 2020 +0200

    cran: add test

commit 3da6e38d98f29ccac0311de28d9fafcc79edb12d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:34 2020 +0200

    cran: handle date

commit 0d6de22a7df366d54dfc79040e2f5294a11b8a4f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 15:57:44 2020 +0200

    add tests for revisions generated by the debian loader.

commit 647ea05327d9e32c30b986afa2053b9fc456bc2c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 20 15:24:28 2020 +0200

    [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/888/ for more details.

vlorentz updated this revision to Diff 13627.Sep 1 2020, 3:13 PM
  • add comments
  • drop date from original-artifacts metadata.

Build is green

Patch application report for D3820 (id=13627)

Rebasing onto e6fcfb931a...

First, rewinding head to replay your work on top of it...
Applying: [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.
Applying: add tests for revisions generated by the debian loader.
Applying: cran: handle date
Applying: cran: add test
Applying: Start adding the origin context
Applying: pypi: add test
Applying: Rename original-artifact-json to original-artifacts-json.
Applying: npm format 2: build origins urls.
Applying: npm format 2: fix format of original_artifact.
Applying: npm: add tests
Applying: cran: add support for revisions with date
Applying: tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.
Applying: debian: rewrite original_artifacts to the current format.
Applying: pypi: detect origin from format 2, and fix format of original_artifacts.
Applying: explicitly error if the loader type could not be detected.
Applying: cran: detect package name when 'provider' does not have the right format.
Applying: pypi: improve origin detection from filename.
Applying: pypi: add support for another format
Applying: pypi: get rid of the package name heuristic, it's unreliable.
Applying: When available, use metadata['when'] as discovery date instead of the revision date.
Applying: gnu: add tests.
Applying: nixguix: add test.
Applying: Fix crash when original_artifact is missing an url.
Applying: cran: improve package name detection.
Applying: deposit: add tests.
Applying: deposit: add support for revisions with no metadata.
Applying: deduplicate origin checking.
Applying: deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.
Applying: deposit: fix tests
Applying: deposit: Change authority type from FORGE to DEPOSIT_CLIENT.
Applying: deposit: use external_id from the HTTP header instead of the metadata.
Applying: deposit: use the date of the deposit request for each metadata item, instead of the same date for all.
Applying: npm: unescape package names.
Applying: deposit: add another exception
Applying: deposit: add another exception
Applying: remove prints
Applying: add origin deposit exceptions.
Applying: add an origin cache, to spare a request to storage.origin.get for each origin.
Applying: add test for deposit format 1.
Applying: deposit tests: add docstrings
Applying: deposit: shorten test data
Applying: add comments
Applying: drop date from original-artifacts metadata.
Changes applied before test
commit ea963c40b122feabb94c270925069908a8b3617a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 15:13:38 2020 +0200

    drop date from original-artifacts metadata.

commit 8648a84b500f5c6f327734daeea95084b2169a52
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 15:09:44 2020 +0200

    add comments

commit 43306432cb87d0888bfb09be8d986c7bce137d95
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 12:57:14 2020 +0200

    deposit: shorten test data

commit d4e99d19b26a2fca140f573decdcee65cc33953c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 12:53:44 2020 +0200

    deposit tests: add docstrings

commit 58d4f5bf8b2fe1df1b78aa9ad8e027e3226d43f3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 11:34:28 2020 +0200

    add test for deposit format 1.

commit 2784e81c5663272860b0102cc30f3e3b797d500b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:42:09 2020 +0200

    add an origin cache, to spare a request to storage.origin.get for each origin.

commit 8e89c0b8cb88a061d626dea3fd5e2e4d97fec9c6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:39:55 2020 +0200

    add origin deposit exceptions.

commit 034688fe273b0391bcfde35812c426b00e089b39
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:39:39 2020 +0200

    remove prints

commit e050387af0b66daabbdc543bf0ab3b346d2c11cd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 23:28:07 2020 +0200

    deposit: add another exception

commit 949b00ab5b1f1944906b332a153e5a2ef421e840
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 23:27:27 2020 +0200

    deposit: add another exception

commit 4dcfc2d4c0a2735e3670020525bdbf9b335c8df4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 10:27:06 2020 +0200

    npm: unescape package names.

commit 86e618fa2cf38bc2ae632a1835158a88054c5623
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:56:22 2020 +0200

    deposit: use the date of the deposit request for each metadata item, instead of the same date for all.
    
    it's more accurate, and allows storing all versions of the metadata, as we
    can't store metadata twice at the same date.

commit ce0f64469538cffa8d81038b5dd40e1c6d3fcd7a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:40:53 2020 +0200

    deposit: use external_id from the HTTP header instead of the metadata.
    
    The origin URL is created from the one in the Slug header, and the one
    in the metadata file may be wrong.

commit 454b23b2cebbbffc15cbe1ce62ece697136f78c6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:39:53 2020 +0200

    deposit: Change authority type from FORGE to DEPOSIT_CLIENT.

commit c14148114a07672a418cb2c948879c718d467ca6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:27:20 2020 +0200

    deposit: fix tests

commit 8e834baff7a4b8a1c21c825a1e2826e7d083e73a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:04:29 2020 +0200

    deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.

commit 1e13b1cfe1cae3bbabad5eb1666f890e8146654d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:51:27 2020 +0200

    deduplicate origin checking.

commit 1d847af37a3e6fd9a0e32731d53ad611c0ccbbae
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:50:28 2020 +0200

    deposit: add support for revisions with no metadata.

commit a221f64a82de48acac63956b2ed8db9321bed2fe
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:48:39 2020 +0200

    deposit: add tests.

commit b68bb94d2a644fa613c4f671f7d6513e486226ef
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 09:51:06 2020 +0200

    cran: improve package name detection.

commit 3b6044ce3a8b89e8b85527baee1c1042b8c89c57
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:51:15 2020 +0200

    Fix crash when original_artifact is missing an url.

commit 0e8e6e1f07ea79e4a38346fb4ea569257739a9c2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:50:41 2020 +0200

    nixguix: add test.

commit 76ec3fd2c36e2eb1421a1bd3e38249eb54b8c400
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:43:09 2020 +0200

    gnu: add tests.

commit 2dcb52637cbaa5a0b130ee877a9dee027d45f779
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:42:17 2020 +0200

    When available, use metadata['when'] as discovery date instead of the revision date.
    
    revision date is when the date was updated, not the date it was seen.

commit 2a8dd0dbb3eb79b91ef8cb0c6df9b277f1674166
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:08:43 2020 +0200

    pypi: get rid of the package name heuristic, it's unreliable.

commit b6c5cceb342a11960368ae52bab9c4f65dcd43a9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 15:12:36 2020 +0200

    pypi: add support for another format

commit 4f0f8d49595b9020f31070cf3f45931b85a7fd32
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:54:30 2020 +0200

    pypi: improve origin detection from filename.

commit c077fa55a69230604c6b9cf85b23127c3f105fbf
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:49:15 2020 +0200

    cran: detect package name when 'provider' does not have the right format.

commit 6909510bb77b46860e737ece4fd3d610d3d2c429
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:39:07 2020 +0200

    explicitly error if the loader type could not be detected.

commit 9cd2bfeda7f4ad47f1d4d7959f018955b3024f42
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:38:38 2020 +0200

    pypi: detect origin from format 2, and fix format of original_artifacts.

commit c4a72648df8a4941eb17fdfde1dd76a31dca962d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:51:20 2020 +0200

    debian: rewrite original_artifacts to the current format.

commit 42c7845074d2802bbc1204d9964b2b30d759cd31
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:03:38 2020 +0200

    tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.

commit df5826a14ecd4c1f409af33a7ec8f74e8ab34830
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:36:19 2020 +0200

    cran: add support for revisions with date

commit 9925a6703e59dff82d032243a092fee5eac3c952
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:43 2020 +0200

    npm: add tests

commit 0517276cbccf75faa9a88368cee62c072191eebd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:31 2020 +0200

    npm format 2: fix format of original_artifact.

commit 5bf75f0dedb3d4baed1df59ed1dbf831b3f2f2e2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:18:56 2020 +0200

    npm format 2: build origins urls.

commit 4ed5fddf190754c07b583a728129977c00195f96
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:17:47 2020 +0200

    Rename original-artifact-json to original-artifacts-json.
    
    As in the core loader.

commit b77cf9054f8cc271f6c6d49a7271bd12b8d3a0cb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:05:09 2020 +0200

    pypi: add test

commit 11e9ad7f084f48a484f5a47fa9d9ba07e9394625
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:04:53 2020 +0200

    Start adding the origin context

commit 07d6324caf87c8d01eab18d1c3ab8e894718379a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:52 2020 +0200

    cran: add test

commit b2801aa50e7d9c9c6e05b899d5522f7ce4ae7340
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:34 2020 +0200

    cran: handle date

commit 58f45dabd44f077459dde28b6dda11f6b920cf07
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 15:57:44 2020 +0200

    add tests for revisions generated by the debian loader.

commit a973ba6fe0997f6946d42dfcfa9d112d443b6515
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 20 15:24:28 2020 +0200

    [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/890/ for more details.

vlorentz updated this revision to Diff 13635.Sep 2 2020, 12:06 PM
  • Add exception for deposit id 159, which is missing from the deposit DB.
  • Check deposit status is 'success', and exclude deposit 342 which is failed.

Build is green

Patch application report for D3820 (id=13635)

Rebasing onto e6fcfb931a...

First, rewinding head to replay your work on top of it...
Applying: [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.
Applying: add tests for revisions generated by the debian loader.
Applying: cran: handle date
Applying: cran: add test
Applying: Start adding the origin context
Applying: pypi: add test
Applying: Rename original-artifact-json to original-artifacts-json.
Applying: npm format 2: build origins urls.
Applying: npm format 2: fix format of original_artifact.
Applying: npm: add tests
Applying: cran: add support for revisions with date
Applying: tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.
Applying: debian: rewrite original_artifacts to the current format.
Applying: pypi: detect origin from format 2, and fix format of original_artifacts.
Applying: explicitly error if the loader type could not be detected.
Applying: cran: detect package name when 'provider' does not have the right format.
Applying: pypi: improve origin detection from filename.
Applying: pypi: add support for another format
Applying: pypi: get rid of the package name heuristic, it's unreliable.
Applying: When available, use metadata['when'] as discovery date instead of the revision date.
Applying: gnu: add tests.
Applying: nixguix: add test.
Applying: Fix crash when original_artifact is missing an url.
Applying: cran: improve package name detection.
Applying: deposit: add tests.
Applying: deposit: add support for revisions with no metadata.
Applying: deduplicate origin checking.
Applying: deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.
Applying: deposit: fix tests
Applying: deposit: Change authority type from FORGE to DEPOSIT_CLIENT.
Applying: deposit: use external_id from the HTTP header instead of the metadata.
Applying: deposit: use the date of the deposit request for each metadata item, instead of the same date for all.
Applying: npm: unescape package names.
Applying: deposit: add another exception
Applying: deposit: add another exception
Applying: remove prints
Applying: add origin deposit exceptions.
Applying: add an origin cache, to spare a request to storage.origin.get for each origin.
Applying: add test for deposit format 1.
Applying: deposit tests: add docstrings
Applying: deposit: shorten test data
Applying: add comments
Applying: drop date from original-artifacts metadata.
Applying: Add exception for deposit id 159, which is missing from the deposit DB.
Applying: Check deposit status is 'success', and exclude deposit 342 which is failed.
Changes applied before test
commit 6ffcdd904f8f50205cbe4ee928240e9f25c7d2cf
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 12:06:21 2020 +0200

    Check deposit status is 'success', and exclude deposit 342 which is failed.

commit fe6987dddd650deb6cfb22efdaad1d986285d210
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 09:21:25 2020 +0200

    Add exception for deposit id 159, which is missing from the deposit DB.

commit 6072df41299fc44ef65d473a842e500fafd9d6fa
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 15:13:38 2020 +0200

    drop date from original-artifacts metadata.

commit f15102f2d7c89ad2363bc1da77617617606aa1c7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 15:09:44 2020 +0200

    add comments

commit df4da949afdd5621949d0b166bfdf4b2a202e96f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 12:57:14 2020 +0200

    deposit: shorten test data

commit 52fb1f3fb7487bf5128ee781543725cff3104383
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 12:53:44 2020 +0200

    deposit tests: add docstrings

commit a61cf458a147faaa8ab55278e4436f084ccef6b4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 11:34:28 2020 +0200

    add test for deposit format 1.

commit 7ebd20ef2afc4ba28a4ec9ef3adb6daa76387684
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:42:09 2020 +0200

    add an origin cache, to spare a request to storage.origin.get for each origin.

commit 369e2439bce51456aad8d877c29c4a7d2a2d3b65
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:39:55 2020 +0200

    add origin deposit exceptions.

commit ed7c37394005524c54112daf6edd605954a81b44
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:39:39 2020 +0200

    remove prints

commit a0b11238ea77ceb7956936156e23f61a0559a9de
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 23:28:07 2020 +0200

    deposit: add another exception

commit 9e408dc2b1d479d89cb83f407c5ff5d844f62f4c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 23:27:27 2020 +0200

    deposit: add another exception

commit 1ee4b44bdc0d27531b2577824e84c957e763c0c9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 10:27:06 2020 +0200

    npm: unescape package names.

commit ec982f7aa400d45a064914018ce8007c807b0cfc
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:56:22 2020 +0200

    deposit: use the date of the deposit request for each metadata item, instead of the same date for all.
    
    it's more accurate, and allows storing all versions of the metadata, as we
    can't store metadata twice at the same date.

commit 36c62ff0c0c6d56db97097da643199c3ccc96b18
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:40:53 2020 +0200

    deposit: use external_id from the HTTP header instead of the metadata.
    
    The origin URL is created from the one in the Slug header, and the one
    in the metadata file may be wrong.

commit abf0c258c70101b1fd48f8d21bb40e952b15ec54
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:39:53 2020 +0200

    deposit: Change authority type from FORGE to DEPOSIT_CLIENT.

commit 62d9ce6a6abf6c2a065bc0d0daf2ed34f51e851a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:27:20 2020 +0200

    deposit: fix tests

commit bec71de01632b1e7ab37bc962c41a65938633966
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:04:29 2020 +0200

    deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.

commit e007b0d74721cdc91ad1e363c866e52c735c5760
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:51:27 2020 +0200

    deduplicate origin checking.

commit b8409c925ba8baa3a1c43fdd4dc8e3da75b52491
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:50:28 2020 +0200

    deposit: add support for revisions with no metadata.

commit 424488ef324b257d467d3c7c1c1bca66e7d77ef9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:48:39 2020 +0200

    deposit: add tests.

commit 99c37aed53b10a6d51d52ad31bf06304dde5d33c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 09:51:06 2020 +0200

    cran: improve package name detection.

commit 04dfba666210323f9e6b809c58457dd0e40a41d9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:51:15 2020 +0200

    Fix crash when original_artifact is missing an url.

commit 8d1dc128a340a717bb8df98551910e5202c5f879
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:50:41 2020 +0200

    nixguix: add test.

commit 88de7e3b2d7dd493d20b34f5b904634c506da488
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:43:09 2020 +0200

    gnu: add tests.

commit c94328a339d9f797e241818a8438053d4b2c5c8d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:42:17 2020 +0200

    When available, use metadata['when'] as discovery date instead of the revision date.
    
    revision date is when the date was updated, not the date it was seen.

commit 7c404874fa33513a6ca87a7005777ce37a48d78c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:08:43 2020 +0200

    pypi: get rid of the package name heuristic, it's unreliable.

commit d4486b32fd2acf13b0114e5ea8eb742654742d56
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 15:12:36 2020 +0200

    pypi: add support for another format

commit 70d572dbabacc4823a0211e7c3c0d13c3e0d17eb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:54:30 2020 +0200

    pypi: improve origin detection from filename.

commit 8b094651d59251de70d7ee4b55e9b57f835d10c4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:49:15 2020 +0200

    cran: detect package name when 'provider' does not have the right format.

commit cb14acdf1bf58321f8b82d4e36e7aeb029c9a380
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:39:07 2020 +0200

    explicitly error if the loader type could not be detected.

commit e07fd0a6d192627792538661afbee94bda7fc879
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:38:38 2020 +0200

    pypi: detect origin from format 2, and fix format of original_artifacts.

commit 8f82e561af7b19a0cc7b394bcb1f6db83e38c33d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:51:20 2020 +0200

    debian: rewrite original_artifacts to the current format.

commit 799d2420b773f86b6fb65b3876d1055ad5615712
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:03:38 2020 +0200

    tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.

commit f00d6ac37450c701e8a6b7090a6d5af3221aa976
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:36:19 2020 +0200

    cran: add support for revisions with date

commit 8e3dee09e27887fcbb7541f59cafc74b2d305918
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:43 2020 +0200

    npm: add tests

commit 0e63e06bab2949f6cc48ec112a1422e73ceb722e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:31 2020 +0200

    npm format 2: fix format of original_artifact.

commit e6176a1e153b0c28edee3c0fbd0da10fa1cec1b9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:18:56 2020 +0200

    npm format 2: build origins urls.

commit ca146568a37d25fa56bf0c0e7bd6ff1b1e6915c8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:17:47 2020 +0200

    Rename original-artifact-json to original-artifacts-json.
    
    As in the core loader.

commit cc688f642220afb9d4923d94964500315f5dba3d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:05:09 2020 +0200

    pypi: add test

commit 4adc40401b519839abc86fafc31ada19bd393625
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:04:53 2020 +0200

    Start adding the origin context

commit e2ba867e396c87e65c790f0d06e64faffc4e95ff
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:52 2020 +0200

    cran: add test

commit 80e6564da1f4497ebeb950f6c603dd21b03f5bc0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:34 2020 +0200

    cran: handle date

commit aed16d06c2fffdc448a176368853b30b1bf9a990
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 15:57:44 2020 +0200

    add tests for revisions generated by the debian loader.

commit 241f057d4cc062f0a7ceea3b5f0023e909ccb513
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 20 15:24:28 2020 +0200

    [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/892/ for more details.

vlorentz updated this revision to Diff 13655.Sep 2 2020, 5:02 PM
  • allow failed deposit as long as they have an swhid

Build is green

Patch application report for D3820 (id=13655)

Rebasing onto 36d284c730...

First, rewinding head to replay your work on top of it...
Applying: [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.
Applying: add tests for revisions generated by the debian loader.
Applying: cran: handle date
Applying: cran: add test
Applying: Start adding the origin context
Applying: pypi: add test
Applying: Rename original-artifact-json to original-artifacts-json.
Applying: npm format 2: build origins urls.
Applying: npm format 2: fix format of original_artifact.
Applying: npm: add tests
Applying: cran: add support for revisions with date
Applying: tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.
Applying: debian: rewrite original_artifacts to the current format.
Applying: pypi: detect origin from format 2, and fix format of original_artifacts.
Applying: explicitly error if the loader type could not be detected.
Applying: cran: detect package name when 'provider' does not have the right format.
Applying: pypi: improve origin detection from filename.
Applying: pypi: add support for another format
Applying: pypi: get rid of the package name heuristic, it's unreliable.
Applying: When available, use metadata['when'] as discovery date instead of the revision date.
Applying: gnu: add tests.
Applying: nixguix: add test.
Applying: Fix crash when original_artifact is missing an url.
Applying: cran: improve package name detection.
Applying: deposit: add tests.
Applying: deposit: add support for revisions with no metadata.
Applying: deduplicate origin checking.
Applying: deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.
Applying: deposit: fix tests
Applying: deposit: Change authority type from FORGE to DEPOSIT_CLIENT.
Applying: deposit: use external_id from the HTTP header instead of the metadata.
Applying: deposit: use the date of the deposit request for each metadata item, instead of the same date for all.
Applying: npm: unescape package names.
Applying: deposit: add another exception
Applying: deposit: add another exception
Applying: remove prints
Applying: add origin deposit exceptions.
Applying: add an origin cache, to spare a request to storage.origin.get for each origin.
Applying: add test for deposit format 1.
Applying: deposit tests: add docstrings
Applying: deposit: shorten test data
Applying: add comments
Applying: drop date from original-artifacts metadata.
Applying: Add exception for deposit id 159, which is missing from the deposit DB.
Applying: Check deposit status is 'success', and exclude deposit 342 which is failed.
Applying: allow failed deposit as long as they have an swhid
Changes applied before test
commit c40531c5e1c3ae62439d55bc841224ca71e65fcc
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 12:42:44 2020 +0200

    allow failed deposit as long as they have an swhid

commit 5b5fd8f9c930bece0223f2e20ab8c388f0a4d8f2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 12:06:21 2020 +0200

    Check deposit status is 'success', and exclude deposit 342 which is failed.

commit b4c9ecf0e5c0a4d8478d09f54601f50c0aa93b8c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 09:21:25 2020 +0200

    Add exception for deposit id 159, which is missing from the deposit DB.

commit b8954bbce5c35d0ad5bafc04a7ceca8957e0c51f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 15:13:38 2020 +0200

    drop date from original-artifacts metadata.

commit 1bfcf851e34f8881be283a17cc9750ac7dc78ff6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 15:09:44 2020 +0200

    add comments

commit 9fde10cdfcd0104b478eb0248461d065bfc67803
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 12:57:14 2020 +0200

    deposit: shorten test data

commit 4c37015358e08dcd19e5931b7272790da5e66287
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 12:53:44 2020 +0200

    deposit tests: add docstrings

commit 872e0ebaa1376eb740ade0087783e9654a6f1354
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 11:34:28 2020 +0200

    add test for deposit format 1.

commit baee6c3b441004508a27861291142e74dae51eec
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:42:09 2020 +0200

    add an origin cache, to spare a request to storage.origin.get for each origin.

commit 7eea155e15016689741e7ee5a55fd23054a04fb5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:39:55 2020 +0200

    add origin deposit exceptions.

commit 5b9f37ed0f350cf0fc323642192712b734f6434d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:39:39 2020 +0200

    remove prints

commit 07a4a1f015e78d7d1ae667212fdb1d94b9652b5a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 23:28:07 2020 +0200

    deposit: add another exception

commit 78390b647d883491dc46b3dec687c5f9fa8b924d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 23:27:27 2020 +0200

    deposit: add another exception

commit 0fa3250cada1aa41979b261dc9c950e7e3297e8d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 10:27:06 2020 +0200

    npm: unescape package names.

commit b53dd3083e49bf0a33188451d331b25b691e667a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:56:22 2020 +0200

    deposit: use the date of the deposit request for each metadata item, instead of the same date for all.
    
    it's more accurate, and allows storing all versions of the metadata, as we
    can't store metadata twice at the same date.

commit 2f0394f23143f2a9fd9a37c884ef48a2494dd385
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:40:53 2020 +0200

    deposit: use external_id from the HTTP header instead of the metadata.
    
    The origin URL is created from the one in the Slug header, and the one
    in the metadata file may be wrong.

commit a954b8e7660ab0922ca44055b71d654640459956
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:39:53 2020 +0200

    deposit: Change authority type from FORGE to DEPOSIT_CLIENT.

commit b3e01a3149761fcc72febe4db9fe6e2e45b331a7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:27:20 2020 +0200

    deposit: fix tests

commit a86efc6f6d862ec81ed01fadcd3e0e3beb976e1a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:04:29 2020 +0200

    deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.

commit 06f9bba784aed0feb65a2ceee6136e82cfd396e1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:51:27 2020 +0200

    deduplicate origin checking.

commit 925d97bf6ac608ef0c20674100741f3f38c5fa14
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:50:28 2020 +0200

    deposit: add support for revisions with no metadata.

commit 44ddaf4a1fe57575ae0d16651198bc006afb2cd7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:48:39 2020 +0200

    deposit: add tests.

commit 130d776a8946913977ef045fdbb44f4cc2aba87b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 09:51:06 2020 +0200

    cran: improve package name detection.

commit ad18bacd34d4d503d4f6ee849e224294f6f0f8bf
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:51:15 2020 +0200

    Fix crash when original_artifact is missing an url.

commit 2eea50098b34af6eef52f1cbc1f79ed3f73e96ee
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:50:41 2020 +0200

    nixguix: add test.

commit 2c28317dd0df77c69c9282059875d7f61bfc2735
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:43:09 2020 +0200

    gnu: add tests.

commit 1897191203103aa489113e8ebc831d501dcd1125
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:42:17 2020 +0200

    When available, use metadata['when'] as discovery date instead of the revision date.
    
    revision date is when the date was updated, not the date it was seen.

commit 06aa6df36de52e079b2f134ce0e91c875d4050c6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:08:43 2020 +0200

    pypi: get rid of the package name heuristic, it's unreliable.

commit 15142c2a004d31daf7fe86ce78c5cf69900098d6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 15:12:36 2020 +0200

    pypi: add support for another format

commit 6b0f1144402ef0cf7d1193d40812920d346f5e7b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:54:30 2020 +0200

    pypi: improve origin detection from filename.

commit 00fdbd0a31ec6fc1fe2d15bfa34dcc5ee1bee77b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:49:15 2020 +0200

    cran: detect package name when 'provider' does not have the right format.

commit d1ea19747e257963c9181f72a5fee06396f2c7a8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:39:07 2020 +0200

    explicitly error if the loader type could not be detected.

commit 4b7f77cc92c8ebdf0db60f3ad2f0011e9cdd207e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:38:38 2020 +0200

    pypi: detect origin from format 2, and fix format of original_artifacts.

commit f587f7464cfc9be2428d98305e0e4293bb52242a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:51:20 2020 +0200

    debian: rewrite original_artifacts to the current format.

commit 945af5daf05ec45ae4a0249ee67234664c62c7a3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:03:38 2020 +0200

    tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.

commit b3d1787c1ca8c7541232ce00c50d704150e52cd4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:36:19 2020 +0200

    cran: add support for revisions with date

commit 69d7dd059ab62a3e5a7d3491dca3c8d2235cadbc
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:43 2020 +0200

    npm: add tests

commit 9010108ccb22f06a1870dac2dde8d2b114b59dcc
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:31 2020 +0200

    npm format 2: fix format of original_artifact.

commit 8aaf3bbf52772ba233048fd9fb8c2544cbbde254
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:18:56 2020 +0200

    npm format 2: build origins urls.

commit cf0b15bd1e976a76e317ce8559d0b1cd96b6769e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:17:47 2020 +0200

    Rename original-artifact-json to original-artifacts-json.
    
    As in the core loader.

commit 3562e9ae7a1dfbecb8719030ad14d2b2458d082f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:05:09 2020 +0200

    pypi: add test

commit ae8fd18f85aa91c019e2a77d2bbdbeb7c6c9dd5b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:04:53 2020 +0200

    Start adding the origin context

commit 120fb3208c18f34bfd8bd0dc6dc4bf267d1e02e4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:52 2020 +0200

    cran: add test

commit be660b75f2fb27346e9f67cd35259d7f5d07cd07
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:34 2020 +0200

    cran: handle date

commit 0d04d8b719795aecf47bac238090bbc00de03ada
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 15:57:44 2020 +0200

    add tests for revisions generated by the debian loader.

commit 2146ab483e1268444f71056cbffddc0e7a383bc5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 20 15:24:28 2020 +0200

    [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/898/ for more details.

heads up, made a first oral review with vlorentz, need to make another read again.

vlorentz retitled this revision from [WIP] Add a Python script to migrate extrinsic metadata from revision metadata. to Add a Python script to migrate extrinsic metadata from revision metadata..Sep 4 2020, 9:39 AM
ardumont added inline comments.Sep 4 2020, 10:59 AM
swh/storage/migrate_extrinsic_metadata.py
455

where is the load_metadata call in this conditional?

495

same load_metadata call?

vlorentz added inline comments.Sep 4 2020, 11:20 AM
swh/storage/migrate_extrinsic_metadata.py
455

raw_extrinsic_metadata == {}, so there is no metadata to load

495

metadata["extrinsic"]["raw"] == {}, so there is no metadata to load either

ardumont added inline comments.Sep 4 2020, 11:45 AM
swh/storage/tests/migrate_extrinsic_metadata/test_npm.py
34

NPM_

151

npm

swh/storage/tests/migrate_extrinsic_metadata/test_pypi.py
58

What's the different use cases scenario here (test_pypi_{1, 2...})?
A small docstring you did in npm would be nice.

i gather this one is with a provider...

vlorentz updated this revision to Diff 13693.Sep 4 2020, 12:29 PM
  • fix issues noticed during oral review
  • deposit: add another exception

I had another look, sounds good.

I only pass rapidly on the test_deposit one though.

vlorentz updated this revision to Diff 13694.Sep 4 2020, 12:32 PM
vlorentz marked 2 inline comments as done.
  • pypi: add comments on tests

Build is green

Patch application report for D3820 (id=13693)

Rebasing onto 356eacd763...

First, rewinding head to replay your work on top of it...
Applying: [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.
Applying: add tests for revisions generated by the debian loader.
Applying: cran: handle date
Applying: cran: add test
Applying: Start adding the origin context
Applying: pypi: add test
Applying: Rename original-artifact-json to original-artifacts-json.
Applying: npm format 2: build origins urls.
Applying: npm format 2: fix format of original_artifact.
Applying: npm: add tests
Applying: cran: add support for revisions with date
Applying: tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.
Applying: debian: rewrite original_artifacts to the current format.
Applying: pypi: detect origin from format 2, and fix format of original_artifacts.
Applying: explicitly error if the loader type could not be detected.
Applying: cran: detect package name when 'provider' does not have the right format.
Applying: pypi: improve origin detection from filename.
Applying: pypi: add support for another format
Applying: pypi: get rid of the package name heuristic, it's unreliable.
Applying: When available, use metadata['when'] as discovery date instead of the revision date.
Applying: gnu: add tests.
Applying: nixguix: add test.
Applying: Fix crash when original_artifact is missing an url.
Applying: cran: improve package name detection.
Applying: deposit: add tests.
Applying: deposit: add support for revisions with no metadata.
Applying: deduplicate origin checking.
Applying: deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.
Applying: deposit: fix tests
Applying: deposit: Change authority type from FORGE to DEPOSIT_CLIENT.
Applying: deposit: use external_id from the HTTP header instead of the metadata.
Applying: deposit: use the date of the deposit request for each metadata item, instead of the same date for all.
Applying: npm: unescape package names.
Applying: deposit: add another exception
Applying: deposit: add another exception
Applying: remove prints
Applying: add origin deposit exceptions.
Applying: add an origin cache, to spare a request to storage.origin.get for each origin.
Applying: add test for deposit format 1.
Applying: deposit tests: add docstrings
Applying: deposit: shorten test data
Applying: add comments
Applying: drop date from original-artifacts metadata.
Applying: Add exception for deposit id 159, which is missing from the deposit DB.
Applying: Check deposit status is 'success', and exclude deposit 342 which is failed.
Applying: allow failed deposit as long as they have an swhid
Applying: fix issues noticed during review
Applying: deposit: add another exception
Changes applied before test
commit a5e0a292409f2348a607a08ea5d01081e6c7e1cb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 23:00:47 2020 +0200

    deposit: add another exception

commit d994326a271694091f99b45e53ffd334528c3f0d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 23:00:29 2020 +0200

    fix issues noticed during review

commit 5bc897a09075a2a0a557e906a593cfa020eed23f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 12:42:44 2020 +0200

    allow failed deposit as long as they have an swhid

commit 5d77c670832e04867820c29d1dfdf7f2c5c1fbe9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 12:06:21 2020 +0200

    Check deposit status is 'success', and exclude deposit 342 which is failed.

commit e04625f11122e116a8f5277a7a5d261f328f3adc
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 09:21:25 2020 +0200

    Add exception for deposit id 159, which is missing from the deposit DB.

commit b10db214d316bd115888922de341f0e230713c1e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 15:13:38 2020 +0200

    drop date from original-artifacts metadata.

commit 9271d8b4434557e6fb54beb2d38965b3d9396a7a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 15:09:44 2020 +0200

    add comments

commit 9f49f239548900e15100c1d93a87e11369d35da2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 12:57:14 2020 +0200

    deposit: shorten test data

commit 070066d63ead085704815a2127ea00d50509ed0c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 12:53:44 2020 +0200

    deposit tests: add docstrings

commit c6aac917b7d6583cd66e20585a9b7e4ef109ce5a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 11:34:28 2020 +0200

    add test for deposit format 1.

commit 92f48f95e253a720110e094baba8393ca1061e19
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:42:09 2020 +0200

    add an origin cache, to spare a request to storage.origin.get for each origin.

commit 6b45efc018ef34cc2fdc91609b859e3f3509be8e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:39:55 2020 +0200

    add origin deposit exceptions.

commit bd971d85eae1b99818f5d4a93963fd1f0ad50be2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:39:39 2020 +0200

    remove prints

commit 34d02587ef1860b56828ff30411994d67908710d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 23:28:07 2020 +0200

    deposit: add another exception

commit f9fc62383733855f8782c880ec98953f31c26508
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 23:27:27 2020 +0200

    deposit: add another exception

commit 3b4aa42aa3ce2647c91ac888733e3ae4e5487342
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 10:27:06 2020 +0200

    npm: unescape package names.

commit d4a6b161fabf711d940cf913d171ceb16d47416f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:56:22 2020 +0200

    deposit: use the date of the deposit request for each metadata item, instead of the same date for all.
    
    it's more accurate, and allows storing all versions of the metadata, as we
    can't store metadata twice at the same date.

commit 010fcaee1973caae88c86c1f814218f4b5346b9c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:40:53 2020 +0200

    deposit: use external_id from the HTTP header instead of the metadata.
    
    The origin URL is created from the one in the Slug header, and the one
    in the metadata file may be wrong.

commit 342bbf46cddb83116b836bde41da707b8bfd8b25
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:39:53 2020 +0200

    deposit: Change authority type from FORGE to DEPOSIT_CLIENT.

commit 00a0779accbc57ec0857c62c1afe5bd64397562d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:27:20 2020 +0200

    deposit: fix tests

commit e5ab4c4ae917b52c5976e5b62a3aad78c95ca1e5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:04:29 2020 +0200

    deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.

commit ac7771ed608c55f7d5192a18bbc668884c02fcdc
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:51:27 2020 +0200

    deduplicate origin checking.

commit c990fd7b06e69de5b9268cfdbe2f9ae35ab98816
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:50:28 2020 +0200

    deposit: add support for revisions with no metadata.

commit 0d6c7b6231c436a5a008536881b972b836aa65b2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:48:39 2020 +0200

    deposit: add tests.

commit 68707b96ce4dd681f4141895d9b561fcf9b70b8a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 09:51:06 2020 +0200

    cran: improve package name detection.

commit 38155ab6fbb28094cfec0e2ca2e72d6e0d40c3db
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:51:15 2020 +0200

    Fix crash when original_artifact is missing an url.

commit ec7fda6ff5924d9ff29004edaa1cbe50ce405b8d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:50:41 2020 +0200

    nixguix: add test.

commit ab398d0fbb417565665ffa9fde7fa7d39514b9bb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:43:09 2020 +0200

    gnu: add tests.

commit b687ab49faf2c24082f10c8ae54fbc399333cfe0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:42:17 2020 +0200

    When available, use metadata['when'] as discovery date instead of the revision date.
    
    revision date is when the date was updated, not the date it was seen.

commit 8f3325d3202b06019720fc3954be891384eedeab
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:08:43 2020 +0200

    pypi: get rid of the package name heuristic, it's unreliable.

commit 646fa6d90e603e6a36a198579c829fd92eb0d393
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 15:12:36 2020 +0200

    pypi: add support for another format

commit c4c2f5bd2a9f935d07aa82a1fd62cbbd5bcec832
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:54:30 2020 +0200

    pypi: improve origin detection from filename.

commit 5a8d26f757d0b12669d95747629032c5c7a0b982
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:49:15 2020 +0200

    cran: detect package name when 'provider' does not have the right format.

commit cfa6f6a2ba3915bba14794eb32638bab8aabc5da
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:39:07 2020 +0200

    explicitly error if the loader type could not be detected.

commit 8a83b90c43718922791368a96140bdbd7e321009
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:38:38 2020 +0200

    pypi: detect origin from format 2, and fix format of original_artifacts.

commit 9145e019d9ba60e3dbb1fa363f038c7c3a2d8a58
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:51:20 2020 +0200

    debian: rewrite original_artifacts to the current format.

commit 898d4ad0b768e9666d0f2ed0ab3397966a05fda4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:03:38 2020 +0200

    tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.

commit c8d2dc164df86bba3be05f4c0e645ef09465cab8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:36:19 2020 +0200

    cran: add support for revisions with date

commit 2075caa6403d7e63ad8589cb4093c12d0b7fb5d3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:43 2020 +0200

    npm: add tests

commit c2c39adc4cb415d35949c39b4d5cba4913be21d6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:31 2020 +0200

    npm format 2: fix format of original_artifact.

commit 8ab546743b02b9d1698252aacbccd1b0374dac5b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:18:56 2020 +0200

    npm format 2: build origins urls.

commit a1f272d343296462199db9132348cdd97ccceeec
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:17:47 2020 +0200

    Rename original-artifact-json to original-artifacts-json.
    
    As in the core loader.

commit 52e229d0fea64ee5bbd428aaaf503cec4eb49fe9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:05:09 2020 +0200

    pypi: add test

commit 998d3c6a55eecc9989bb3787abc2e9d2894f9753
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:04:53 2020 +0200

    Start adding the origin context

commit bd1a84a1c09d097a08f3f8c2b055abbbf0f47734
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:52 2020 +0200

    cran: add test

commit 26311d6c0c1c70a37d1ecab6675109ed82fb043a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:34 2020 +0200

    cran: handle date

commit f3002ceb731d5a23e3dfe7bec85db8cca615d2ec
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 15:57:44 2020 +0200

    add tests for revisions generated by the debian loader.

commit c581fa31c3d0f2d3810b11c6be0d2ce3dc15fe19
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 20 15:24:28 2020 +0200

    [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/904/ for more details.

Build is green

Patch application report for D3820 (id=13694)

Rebasing onto 356eacd763...

First, rewinding head to replay your work on top of it...
Applying: [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.
Applying: add tests for revisions generated by the debian loader.
Applying: cran: handle date
Applying: cran: add test
Applying: Start adding the origin context
Applying: pypi: add test
Applying: Rename original-artifact-json to original-artifacts-json.
Applying: npm format 2: build origins urls.
Applying: npm format 2: fix format of original_artifact.
Applying: npm: add tests
Applying: cran: add support for revisions with date
Applying: tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.
Applying: debian: rewrite original_artifacts to the current format.
Applying: pypi: detect origin from format 2, and fix format of original_artifacts.
Applying: explicitly error if the loader type could not be detected.
Applying: cran: detect package name when 'provider' does not have the right format.
Applying: pypi: improve origin detection from filename.
Applying: pypi: add support for another format
Applying: pypi: get rid of the package name heuristic, it's unreliable.
Applying: When available, use metadata['when'] as discovery date instead of the revision date.
Applying: gnu: add tests.
Applying: nixguix: add test.
Applying: Fix crash when original_artifact is missing an url.
Applying: cran: improve package name detection.
Applying: deposit: add tests.
Applying: deposit: add support for revisions with no metadata.
Applying: deduplicate origin checking.
Applying: deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.
Applying: deposit: fix tests
Applying: deposit: Change authority type from FORGE to DEPOSIT_CLIENT.
Applying: deposit: use external_id from the HTTP header instead of the metadata.
Applying: deposit: use the date of the deposit request for each metadata item, instead of the same date for all.
Applying: npm: unescape package names.
Applying: deposit: add another exception
Applying: deposit: add another exception
Applying: remove prints
Applying: add origin deposit exceptions.
Applying: add an origin cache, to spare a request to storage.origin.get for each origin.
Applying: add test for deposit format 1.
Applying: deposit tests: add docstrings
Applying: deposit: shorten test data
Applying: add comments
Applying: drop date from original-artifacts metadata.
Applying: Add exception for deposit id 159, which is missing from the deposit DB.
Applying: Check deposit status is 'success', and exclude deposit 342 which is failed.
Applying: allow failed deposit as long as they have an swhid
Applying: fix issues noticed during review
Applying: deposit: add another exception
Applying: pypi: add comments on tests
Changes applied before test
commit df3315c84776bf80ff918de038b9a33db619cc6e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Sep 4 12:32:10 2020 +0200

    pypi: add comments on tests

commit ab518426e2a1d3b778370e0747ad7bb507ac1a20
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 23:00:47 2020 +0200

    deposit: add another exception

commit 1cde2a22bc3cb74dd7f5c7125076758dfd2c6a65
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 23:00:29 2020 +0200

    fix issues noticed during review

commit a1fadfb5c70857dc41d20150017d37f0929b55be
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 12:42:44 2020 +0200

    allow failed deposit as long as they have an swhid

commit 435b1c3aa9b01c2af425097ae90a9b2828438c17
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 12:06:21 2020 +0200

    Check deposit status is 'success', and exclude deposit 342 which is failed.

commit a523e89c07cb1867bd0793913af7458317a710eb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 09:21:25 2020 +0200

    Add exception for deposit id 159, which is missing from the deposit DB.

commit 9a4472617f24e7bb8e1fab6d11c3e3323f5603ad
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 15:13:38 2020 +0200

    drop date from original-artifacts metadata.

commit 2588e3c32f971b4cb18ca3e2e4eecf68a87e4831
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 15:09:44 2020 +0200

    add comments

commit add32ca93a5a443bcdf4eebb0472adfdd447cab0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 12:57:14 2020 +0200

    deposit: shorten test data

commit ca2df61d9b82e0102159910d29dc5902e22d61f9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 12:53:44 2020 +0200

    deposit tests: add docstrings

commit f739e61401380d9168478d34acad2edf0ef61871
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 11:34:28 2020 +0200

    add test for deposit format 1.

commit 379c51ab7d9b3d27a2709e0070770f3e4c651370
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:42:09 2020 +0200

    add an origin cache, to spare a request to storage.origin.get for each origin.

commit 18e2a74397805eb3b764cd654f6d70cc923b124f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:39:55 2020 +0200

    add origin deposit exceptions.

commit c6b03d2e2d4e829a39c7b1830d9ddaac7f67677b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:39:39 2020 +0200

    remove prints

commit c8e818ad65314213c280c1976c5e72d19d663d2e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 23:28:07 2020 +0200

    deposit: add another exception

commit de2a666a7ff11c8f19d8ead05fdd2f429e542fad
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 23:27:27 2020 +0200

    deposit: add another exception

commit adc1a2cb299a1c13c25a97543ae420c8dd75981b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 10:27:06 2020 +0200

    npm: unescape package names.

commit 1af065c9ac8cc93cac0e6f228eaeaec536704e10
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:56:22 2020 +0200

    deposit: use the date of the deposit request for each metadata item, instead of the same date for all.
    
    it's more accurate, and allows storing all versions of the metadata, as we
    can't store metadata twice at the same date.

commit d627e47b13192aeb3a4753d6ebe79ccefacd15cb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:40:53 2020 +0200

    deposit: use external_id from the HTTP header instead of the metadata.
    
    The origin URL is created from the one in the Slug header, and the one
    in the metadata file may be wrong.

commit 0625ac51dd6ec28d65cacc8e0903f0d78d05d5a7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:39:53 2020 +0200

    deposit: Change authority type from FORGE to DEPOSIT_CLIENT.

commit 8d9a32a96739ec8587310060f6369cb781126b7c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:27:20 2020 +0200

    deposit: fix tests

commit 8a759cd3bcdb96f87634ba29d68b2d67e82c8a30
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:04:29 2020 +0200

    deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.

commit 6a827433e9479a8a73c5e426bf73dc1733dab7a1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:51:27 2020 +0200

    deduplicate origin checking.

commit 0b9d18de843b4db948b5589c78564cabacd884dd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:50:28 2020 +0200

    deposit: add support for revisions with no metadata.

commit 56ce17fbbc890e71ab84ac77ba3cd5ceff8ad42f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:48:39 2020 +0200

    deposit: add tests.

commit 71f095e2f4047e35276868a52f9657cce2c3ecd3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 09:51:06 2020 +0200

    cran: improve package name detection.

commit eb01b5ff4d4044d7f8bf11fafc0ed171e9f7fc78
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:51:15 2020 +0200

    Fix crash when original_artifact is missing an url.

commit 35c2922f716a6250d094c6f9703aab90b5cedc35
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:50:41 2020 +0200

    nixguix: add test.

commit dc85322a82aa124049d224bf53f3e1c67b0c05e9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:43:09 2020 +0200

    gnu: add tests.

commit 93ee4af322f5e60ce8310383b7ca2d64e7038494
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:42:17 2020 +0200

    When available, use metadata['when'] as discovery date instead of the revision date.
    
    revision date is when the date was updated, not the date it was seen.

commit f1c61c9f0b6453663417529721880c1bc09496e0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:08:43 2020 +0200

    pypi: get rid of the package name heuristic, it's unreliable.

commit b2a78fec1ac5bf024b2bb662092c573b44e5a319
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 15:12:36 2020 +0200

    pypi: add support for another format

commit e8dfb0c5f2dd41f08e046f8157c0ea3f00110cbd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:54:30 2020 +0200

    pypi: improve origin detection from filename.

commit 9ff9cc20dacf8c2776252257388737e17373dcc6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:49:15 2020 +0200

    cran: detect package name when 'provider' does not have the right format.

commit 877f5e02a46147ac202f340ba77c7b1999e2d55f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:39:07 2020 +0200

    explicitly error if the loader type could not be detected.

commit 3c279bda3f9fefbd6f563e0d687d24a0c22754cc
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:38:38 2020 +0200

    pypi: detect origin from format 2, and fix format of original_artifacts.

commit 482f21e1fd90a15314a4303d716633694782b376
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:51:20 2020 +0200

    debian: rewrite original_artifacts to the current format.

commit 2f92ef910a78882c833f48a0a60ef7f9a4219a89
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:03:38 2020 +0200

    tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.

commit 0f2bbc62dbd440ebd82bce95eb5cbd799885ec62
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:36:19 2020 +0200

    cran: add support for revisions with date

commit d832717f8e87a425e317aad484c5820f51bf8574
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:43 2020 +0200

    npm: add tests

commit f467a117fbc166854f891e80384451815c0a79b8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:31 2020 +0200

    npm format 2: fix format of original_artifact.

commit e5b29bbfd2bdc52a25d6c56b2aa0b0f9c947d4aa
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:18:56 2020 +0200

    npm format 2: build origins urls.

commit 57ae3e8d6e3ebee5437e9f4e048ffed6771b2d60
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:17:47 2020 +0200

    Rename original-artifact-json to original-artifacts-json.
    
    As in the core loader.

commit 1d7f5728c7e16179da61485b3af4ea58bb17bec0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:05:09 2020 +0200

    pypi: add test

commit e3db7db46116da633ec65a07719da4761af8f63a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:04:53 2020 +0200

    Start adding the origin context

commit b878ba96dc48ec8606e20d99b3955ab3fb08994d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:52 2020 +0200

    cran: add test

commit 9162a4e0adbce5330d426b4aa4b0bc52e5ba0a60
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:34 2020 +0200

    cran: handle date

commit 6323cd95a13c66da969d11de1b01c79220c182e8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 15:57:44 2020 +0200

    add tests for revisions generated by the debian loader.

commit 6a8b47d3d0e5fd0f0d51b5ccc0aff72b3d25eaf3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 20 15:24:28 2020 +0200

    [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/905/ for more details.

olasd added a subscriber: olasd.Sep 8 2020, 10:20 AM

@vlorentz: Would you be able to put the output of the dry-run of this script somewhere on uffizi?

I've given this a quick read through and I'd much appreciate being able to review the "result" alongside the process :)

Thanks!

vlorentz updated this revision to Diff 13713.Sep 8 2020, 11:54 AM
  • add remaining deposit edge cases
vlorentz updated this revision to Diff 13714.Sep 8 2020, 11:55 AM
  • don't crash when CRAN origins are missing.

Build is green

Patch application report for D3820 (id=13713)

Rebasing onto 374e01cf36...

First, rewinding head to replay your work on top of it...
Applying: [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.
Applying: add tests for revisions generated by the debian loader.
Applying: cran: handle date
Applying: cran: add test
Applying: Start adding the origin context
Applying: pypi: add test
Applying: Rename original-artifact-json to original-artifacts-json.
Applying: npm format 2: build origins urls.
Applying: npm format 2: fix format of original_artifact.
Applying: npm: add tests
Applying: cran: add support for revisions with date
Applying: tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.
Applying: debian: rewrite original_artifacts to the current format.
Applying: pypi: detect origin from format 2, and fix format of original_artifacts.
Applying: explicitly error if the loader type could not be detected.
Applying: cran: detect package name when 'provider' does not have the right format.
Applying: pypi: improve origin detection from filename.
Applying: pypi: add support for another format
Applying: pypi: get rid of the package name heuristic, it's unreliable.
Applying: When available, use metadata['when'] as discovery date instead of the revision date.
Applying: gnu: add tests.
Applying: nixguix: add test.
Applying: Fix crash when original_artifact is missing an url.
Applying: cran: improve package name detection.
Applying: deposit: add tests.
Applying: deposit: add support for revisions with no metadata.
Applying: deduplicate origin checking.
Applying: deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.
Applying: deposit: fix tests
Applying: deposit: Change authority type from FORGE to DEPOSIT_CLIENT.
Applying: deposit: use external_id from the HTTP header instead of the metadata.
Applying: deposit: use the date of the deposit request for each metadata item, instead of the same date for all.
Applying: npm: unescape package names.
Applying: deposit: add another exception
Applying: deposit: add another exception
Applying: remove prints
Applying: add origin deposit exceptions.
Applying: add an origin cache, to spare a request to storage.origin.get for each origin.
Applying: add test for deposit format 1.
Applying: deposit tests: add docstrings
Applying: deposit: shorten test data
Applying: add comments
Applying: drop date from original-artifacts metadata.
Applying: Add exception for deposit id 159, which is missing from the deposit DB.
Applying: Check deposit status is 'success', and exclude deposit 342 which is failed.
Applying: allow failed deposit as long as they have an swhid
Applying: fix issues noticed during review
Applying: deposit: add another exception
Applying: pypi: add comments on tests
Applying: add remaining deposit edge cases
Changes applied before test
commit f70bb170bc85e3ee1ac4a6e58e12eeb7bf81e777
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 8 11:53:50 2020 +0200

    add remaining deposit edge cases

commit e17857c5bfe62907cd16d5eff40c49533148d9a7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Sep 4 12:32:10 2020 +0200

    pypi: add comments on tests

commit e8eb9c3a33d0957ad28bc81d096c81a897f29756
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 23:00:47 2020 +0200

    deposit: add another exception

commit 72e97a0c8af197dfed0da2001274f8541d192e71
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 23:00:29 2020 +0200

    fix issues noticed during review

commit 92b8cbb7932ebcb2e9bba7356b91b21e66d82f65
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 12:42:44 2020 +0200

    allow failed deposit as long as they have an swhid

commit b722c96226062413e4aedc8875b3089c8aa19f96
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 12:06:21 2020 +0200

    Check deposit status is 'success', and exclude deposit 342 which is failed.

commit e22237b0d22c18e9442febb57e789c176c8c22f5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 09:21:25 2020 +0200

    Add exception for deposit id 159, which is missing from the deposit DB.

commit 10136a8c8481dcc72fad74d21d6d528385b9d02c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 15:13:38 2020 +0200

    drop date from original-artifacts metadata.

commit 879b8b49e8588921a3103dfd90e409b327a47482
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 15:09:44 2020 +0200

    add comments

commit 192ded864077fbb403efb361f64fd21f5287b29d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 12:57:14 2020 +0200

    deposit: shorten test data

commit 31a48e84ee18760ca766c3e8304783894d323dfc
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 12:53:44 2020 +0200

    deposit tests: add docstrings

commit f0fdbc7fd24dd756ded7ad3ca06fb673ea444a0c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 11:34:28 2020 +0200

    add test for deposit format 1.

commit 0428a1a7b65275ae357614fbbfd9c67464531af9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:42:09 2020 +0200

    add an origin cache, to spare a request to storage.origin.get for each origin.

commit 25e3fb5573da772cf378aad58038c3fc012e2863
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:39:55 2020 +0200

    add origin deposit exceptions.

commit 1ab519dc65ed39db665505f5436107809d7e4ba9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:39:39 2020 +0200

    remove prints

commit eaddbd8bd7adbc2e507f2d4d4ab698a1fdff44fe
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 23:28:07 2020 +0200

    deposit: add another exception

commit d5404427d45509344f750fdc2dae8be373337f17
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 23:27:27 2020 +0200

    deposit: add another exception

commit 3e1a98571bf0d04749df81f6fca2f788137926ca
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 10:27:06 2020 +0200

    npm: unescape package names.

commit 4d24c6a8b59e7e8adbad243da1f64cf2eff80e5e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:56:22 2020 +0200

    deposit: use the date of the deposit request for each metadata item, instead of the same date for all.
    
    it's more accurate, and allows storing all versions of the metadata, as we
    can't store metadata twice at the same date.

commit 9f4a1b0b789f13b7591f796726e96a768f9774e7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:40:53 2020 +0200

    deposit: use external_id from the HTTP header instead of the metadata.
    
    The origin URL is created from the one in the Slug header, and the one
    in the metadata file may be wrong.

commit 68e9094f8dcb73b364be4de8de406883e369d758
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:39:53 2020 +0200

    deposit: Change authority type from FORGE to DEPOSIT_CLIENT.

commit 75bf202d3384a73f893a36bf85703d5d2f2181de
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:27:20 2020 +0200

    deposit: fix tests

commit 188978278ba85738ae4d2b757af6fbd8c7155723
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:04:29 2020 +0200

    deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.

commit 5b21d2d7a0852df37e028ed865370ad34202ded7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:51:27 2020 +0200

    deduplicate origin checking.

commit ea7bfb771c3368507a18bad95b2633b621464ef4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:50:28 2020 +0200

    deposit: add support for revisions with no metadata.

commit e1b305766f948303ee4134a1196d3a0e2519f335
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:48:39 2020 +0200

    deposit: add tests.

commit cb673d560e38dd723c111002b7bfbe07bb6bbc9c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 09:51:06 2020 +0200

    cran: improve package name detection.

commit 79c42d627b719f27ad3320cd436e6742dc6fc9b5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:51:15 2020 +0200

    Fix crash when original_artifact is missing an url.

commit e2c71f40d18c1c844b9fc0a1055a3bc8f95a74b9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:50:41 2020 +0200

    nixguix: add test.

commit 55659a41325ce8fbcccac5d0535a898fad5f6f54
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:43:09 2020 +0200

    gnu: add tests.

commit f7f3ac03e5a7a59399c81420e09872de7402c020
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:42:17 2020 +0200

    When available, use metadata['when'] as discovery date instead of the revision date.
    
    revision date is when the date was updated, not the date it was seen.

commit 1ccad56ba2a18dcb99c1c763707b3ceea397adbf
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:08:43 2020 +0200

    pypi: get rid of the package name heuristic, it's unreliable.

commit a58243b3e473178b61d7b1486bcd49cde3c971c4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 15:12:36 2020 +0200

    pypi: add support for another format

commit 8f9e0f5a608ecde08d5ac1d50a278973382bde99
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:54:30 2020 +0200

    pypi: improve origin detection from filename.

commit ca72d23c0c192c6549458dbb224f8c993db21412
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:49:15 2020 +0200

    cran: detect package name when 'provider' does not have the right format.

commit 3462743cea1b44ca1dcc39d20208e090a1d2b7f7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:39:07 2020 +0200

    explicitly error if the loader type could not be detected.

commit 7893bec8253f1d52aa3c9af9229d406f166df15e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:38:38 2020 +0200

    pypi: detect origin from format 2, and fix format of original_artifacts.

commit 6671f40d6e7383aec5f98f642c250e362d3a8bc4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:51:20 2020 +0200

    debian: rewrite original_artifacts to the current format.

commit 827e2ea625c99c3d9a2a07885965b126de4c6cc5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:03:38 2020 +0200

    tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.

commit 24b1bd84029debefd80c8ea59d81f8c79a5b2159
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:36:19 2020 +0200

    cran: add support for revisions with date

commit a27eb2d1dce5b3ed6c716a7e7e837a3155486440
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:43 2020 +0200

    npm: add tests

commit f65c4ca8318bc5ea5e8a9a1d2c0bac595ffdb9f0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:31 2020 +0200

    npm format 2: fix format of original_artifact.

commit f9b31d68ba602a98f4df33cedaddf8994e983a49
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:18:56 2020 +0200

    npm format 2: build origins urls.

commit 8d26d915070f4f19bfc361689e1f7c1602da64bb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:17:47 2020 +0200

    Rename original-artifact-json to original-artifacts-json.
    
    As in the core loader.

commit 251c6369a5f0e621f5ed522a24acd8ed9ef72b0c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:05:09 2020 +0200

    pypi: add test

commit f36fa75c57c4b77842d1c6fdd69e3a614e22c3ff
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:04:53 2020 +0200

    Start adding the origin context

commit 133761fad8337e1df973a84496be3d51f7957182
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:52 2020 +0200

    cran: add test

commit 9a8dd6dfa9a65ac6f513cfd54bf89b3decf448d8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:34 2020 +0200

    cran: handle date

commit 5d0d729c405ab523c75fcd8454d75b4b517b044a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 15:57:44 2020 +0200

    add tests for revisions generated by the debian loader.

commit 0a16d9457f9265db50ad2191f9f3aef0ed21d4fb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 20 15:24:28 2020 +0200

    [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/907/ for more details.

Build is green

Patch application report for D3820 (id=13714)

Rebasing onto 374e01cf36...

First, rewinding head to replay your work on top of it...
Applying: [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.
Applying: add tests for revisions generated by the debian loader.
Applying: cran: handle date
Applying: cran: add test
Applying: Start adding the origin context
Applying: pypi: add test
Applying: Rename original-artifact-json to original-artifacts-json.
Applying: npm format 2: build origins urls.
Applying: npm format 2: fix format of original_artifact.
Applying: npm: add tests
Applying: cran: add support for revisions with date
Applying: tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.
Applying: debian: rewrite original_artifacts to the current format.
Applying: pypi: detect origin from format 2, and fix format of original_artifacts.
Applying: explicitly error if the loader type could not be detected.
Applying: cran: detect package name when 'provider' does not have the right format.
Applying: pypi: improve origin detection from filename.
Applying: pypi: add support for another format
Applying: pypi: get rid of the package name heuristic, it's unreliable.
Applying: When available, use metadata['when'] as discovery date instead of the revision date.
Applying: gnu: add tests.
Applying: nixguix: add test.
Applying: Fix crash when original_artifact is missing an url.
Applying: cran: improve package name detection.
Applying: deposit: add tests.
Applying: deposit: add support for revisions with no metadata.
Applying: deduplicate origin checking.
Applying: deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.
Applying: deposit: fix tests
Applying: deposit: Change authority type from FORGE to DEPOSIT_CLIENT.
Applying: deposit: use external_id from the HTTP header instead of the metadata.
Applying: deposit: use the date of the deposit request for each metadata item, instead of the same date for all.
Applying: npm: unescape package names.
Applying: deposit: add another exception
Applying: deposit: add another exception
Applying: remove prints
Applying: add origin deposit exceptions.
Applying: add an origin cache, to spare a request to storage.origin.get for each origin.
Applying: add test for deposit format 1.
Applying: deposit tests: add docstrings
Applying: deposit: shorten test data
Applying: add comments
Applying: drop date from original-artifacts metadata.
Applying: Add exception for deposit id 159, which is missing from the deposit DB.
Applying: Check deposit status is 'success', and exclude deposit 342 which is failed.
Applying: allow failed deposit as long as they have an swhid
Applying: fix issues noticed during review
Applying: deposit: add another exception
Applying: pypi: add comments on tests
Applying: add remaining deposit edge cases
Applying: don't crash when CRAN origins are missing.
Changes applied before test
commit 0465000f97d509da68efc09b5b49a0621323a251
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 8 11:55:04 2020 +0200

    don't crash when CRAN origins are missing.

commit e04ac636a4e7d61a00b3de1ab114c8bacdc14048
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 8 11:53:50 2020 +0200

    add remaining deposit edge cases

commit 49ac80266b2a1315538001d631b96458492cf7d7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Sep 4 12:32:10 2020 +0200

    pypi: add comments on tests

commit 8a131e0f9fc2cb18222b737c591531a691e6c369
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 23:00:47 2020 +0200

    deposit: add another exception

commit b44ec8f8ea7dd2e900eff98a8e15e637fb725f23
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 23:00:29 2020 +0200

    fix issues noticed during review

commit b117e8d915af19a4da4776ffe14bc8b64a3549a5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 12:42:44 2020 +0200

    allow failed deposit as long as they have an swhid

commit c25c6703b508d2f6fa85b7ea78ea25f2c540f36c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 12:06:21 2020 +0200

    Check deposit status is 'success', and exclude deposit 342 which is failed.

commit 123ffa0ba7a0bebe9f7696afddfcf9c95d02d5fe
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 09:21:25 2020 +0200

    Add exception for deposit id 159, which is missing from the deposit DB.

commit ec0fabc5deddaa026f48ffa25632efc05b9174f0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 15:13:38 2020 +0200

    drop date from original-artifacts metadata.

commit f5b1651df4d8709c5a99c7b895aeb497f176f128
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 15:09:44 2020 +0200

    add comments

commit 6f9a2c3afd00991dec4c55cebaeaf608de4cc83a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 12:57:14 2020 +0200

    deposit: shorten test data

commit 7cfcc1ef692d819771a703e1be94c5fa7405ea3b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 12:53:44 2020 +0200

    deposit tests: add docstrings

commit 4eadb06d11dcb417611ee829ffb8dcee91ade058
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 11:34:28 2020 +0200

    add test for deposit format 1.

commit 3901ab4e13d4f60bd3f7f5abe8f6e1dd8ee7d515
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:42:09 2020 +0200

    add an origin cache, to spare a request to storage.origin.get for each origin.

commit 768469e4032b8849aaca1d73431a068b62b64d74
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:39:55 2020 +0200

    add origin deposit exceptions.

commit 4daaeb68be2198e9df18dd50a5c0433be4d2fa60
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:39:39 2020 +0200

    remove prints

commit b01014f72adead96ae5b995d3d75cf0f81975ea3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 23:28:07 2020 +0200

    deposit: add another exception

commit be93522405c52d01fc2fca4178d827b75c4d6dfd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 23:27:27 2020 +0200

    deposit: add another exception

commit 6a01f9eeb64edf537ff7933b29b9fbe92e153a7b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 10:27:06 2020 +0200

    npm: unescape package names.

commit e2d3cbeb7ec17a385af3fa0e23956e563748d245
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:56:22 2020 +0200

    deposit: use the date of the deposit request for each metadata item, instead of the same date for all.
    
    it's more accurate, and allows storing all versions of the metadata, as we
    can't store metadata twice at the same date.

commit 697f5191d83a3edf1142a55e9a5e478eb46ed206
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:40:53 2020 +0200

    deposit: use external_id from the HTTP header instead of the metadata.
    
    The origin URL is created from the one in the Slug header, and the one
    in the metadata file may be wrong.

commit aa6c2160cf5b094dc3b843869daa65eb9e84eccd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:39:53 2020 +0200

    deposit: Change authority type from FORGE to DEPOSIT_CLIENT.

commit 525a1d6ea74e898e804cede5e2559831e1c8b710
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:27:20 2020 +0200

    deposit: fix tests

commit 3284e0f98ebfc7b51c537165b83939d199604613
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:04:29 2020 +0200

    deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.

commit 4e194200f5a53c8a7be9bdc3059f517291b2dac7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:51:27 2020 +0200

    deduplicate origin checking.

commit 1ac1e6ff703b5cc86bee014de33ac826b2f9c6e8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:50:28 2020 +0200

    deposit: add support for revisions with no metadata.

commit a84b9329eef63d1d311a214c71248846aa81a67c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:48:39 2020 +0200

    deposit: add tests.

commit 26019b2623181bc6f95086113d0117b2834d5138
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 09:51:06 2020 +0200

    cran: improve package name detection.

commit 509100d229c2a061d2dc0b6f904920ee007d5ed1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:51:15 2020 +0200

    Fix crash when original_artifact is missing an url.

commit ee4b2d5c0bb13d083f9b7f5097880280330a619d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:50:41 2020 +0200

    nixguix: add test.

commit bb27297d2f5a70c14c63baa57c882efbd427acff
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:43:09 2020 +0200

    gnu: add tests.

commit 1d21164d3415302fc9fa57e9330835ead70f90df
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:42:17 2020 +0200

    When available, use metadata['when'] as discovery date instead of the revision date.
    
    revision date is when the date was updated, not the date it was seen.

commit 9e28d37d09785493966c718668346b2b84684d74
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:08:43 2020 +0200

    pypi: get rid of the package name heuristic, it's unreliable.

commit 187c9fc38ae1939a013b9fb662ef8f54409faf3d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 15:12:36 2020 +0200

    pypi: add support for another format

commit 6e575ae78292bb21cf489afc0b55d1f9d88b8108
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:54:30 2020 +0200

    pypi: improve origin detection from filename.

commit 933ad657f65f94f3c2b1e9ff57201865f3ac7c88
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:49:15 2020 +0200

    cran: detect package name when 'provider' does not have the right format.

commit e5e9e3d9a8b992087dccda4e7fec3d8834196ed5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:39:07 2020 +0200

    explicitly error if the loader type could not be detected.

commit 12548ddea26268ba27e3f731708185e71cc84409
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:38:38 2020 +0200

    pypi: detect origin from format 2, and fix format of original_artifacts.

commit 3331c589fa41e6317f577edfb7b727cd3d2a0f0a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:51:20 2020 +0200

    debian: rewrite original_artifacts to the current format.

commit 6daec9bc8ad487365ab42aa637eeca972feb7764
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:03:38 2020 +0200

    tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.

commit a44913e2b16d84d05d5d7876b64a92cf0cefb9b5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:36:19 2020 +0200

    cran: add support for revisions with date

commit 3e0b95211de03c8a865af53d02519acad3e33533
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:43 2020 +0200

    npm: add tests

commit 96a909384508cf86865184f2a80a4f388525cb61
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:31 2020 +0200

    npm format 2: fix format of original_artifact.

commit d2955832b205f810d5249f9e0a6230eded43b503
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:18:56 2020 +0200

    npm format 2: build origins urls.

commit 3e63cdb7c488295a1fb1bdc80e0191a7700d139e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:17:47 2020 +0200

    Rename original-artifact-json to original-artifacts-json.
    
    As in the core loader.

commit 43a1308f5e9f7ac6d8adad2a38c3dd21f699cf81
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:05:09 2020 +0200

    pypi: add test

commit 57232999607dccca26bb8be746fbd157f76ad63a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:04:53 2020 +0200

    Start adding the origin context

commit 962741b6db57ca92042f946ae43847dcd29c6077
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:52 2020 +0200

    cran: add test

commit 578718293c0c113c329816792392e37485d036aa
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:34 2020 +0200

    cran: handle date

commit d26972c0066a83df1abff0a94a1aac7eb1ffc19c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 15:57:44 2020 +0200

    add tests for revisions generated by the debian loader.

commit 6c2b30f3ab1a89fb730c6c01ca917e9f94260644
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 20 15:24:28 2020 +0200

    [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/908/ for more details.

ardumont accepted this revision as: ardumont.Sep 10 2020, 9:21 AM
This revision is now accepted and ready to land.Sep 10 2020, 9:21 AM
olasd accepted this revision.Sep 10 2020, 9:28 AM

I haven't reviewed the deposit part, which is out of my depth, but I think the way the other loaders are converted looks sensible.

What's the next step here now?

vlorentz updated this revision to Diff 13746.Sep 10 2020, 9:40 AM

rebase and squash

Build is green

Patch application report for D3820 (id=13746)

Rebasing onto 93458a4665...

Current branch diff-target is up to date.
Changes applied before test
commit 5ec70a6bdae128cfb24bc3419d5a5095923b97bf
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 20 15:24:28 2020 +0200

    Add a Python script to migrate extrinsic metadata from revision metadata.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/911/ for more details.