Page MenuHomeSoftware Heritage

migrate_extrinsic_metadata: retry in case of database errors.
ClosedPublic

Authored by vlorentz on Tue, Sep 8, 11:57 AM.

Details

Summary

They are likely to happen since this script takes a long time to run.

Depends on D3820.

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

vlorentz created this revision.Tue, Sep 8, 11:57 AM

Build is green

Patch application report for D3885 (id=13715)

Could not rebase; Attempt merge onto 374e01cf36...

Merge made by the 'recursive' strategy.
 mypy.ini                                           |    3 +
 requirements.txt                                   |    1 +
 swh/storage/migrate_extrinsic_metadata.py          |  924 ++++++++++++++++
 .../tests/migrate_extrinsic_metadata/test_cran.py  |  221 ++++
 .../migrate_extrinsic_metadata/test_debian.py      |  273 +++++
 .../migrate_extrinsic_metadata/test_deposit.py     | 1167 ++++++++++++++++++++
 .../tests/migrate_extrinsic_metadata/test_gnu.py   |  108 ++
 .../migrate_extrinsic_metadata/test_nixguix.py     |  124 +++
 .../tests/migrate_extrinsic_metadata/test_npm.py   |  376 +++++++
 .../tests/migrate_extrinsic_metadata/test_pypi.py  |  356 ++++++
 10 files changed, 3553 insertions(+)
 create mode 100644 swh/storage/migrate_extrinsic_metadata.py
 create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_cran.py
 create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_debian.py
 create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_deposit.py
 create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_gnu.py
 create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_nixguix.py
 create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_npm.py
 create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_pypi.py
Changes applied before test
commit 6890aff4ff9d949daf03deb4180b2abdab50076c
Merge: 374e01cf 85425d63
Author: Jenkins user <jenkins@localhost>
Date:   Tue Sep 8 10:03:32 2020 +0000

    Merge branch 'diff-target' into HEAD

commit 85425d634f8472009b147358725d89607829f901
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 8 11:56:43 2020 +0200

    migrate_extrinsic_metadata: retry in case of database errors.
    
    They are likely to happen since this script takes a long time to run.

commit 99792936ddf9e31796018ce364825636a5958857
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 8 11:55:04 2020 +0200

    don't crash when CRAN origins are missing.

commit 4194c092e74990f1db565c5ad8198cc83ca916ad
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 8 11:53:50 2020 +0200

    add remaining deposit edge cases

commit ead1fb7845e10ac4209dbba5b36e577d000701bc
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Sep 4 12:32:10 2020 +0200

    pypi: add comments on tests

commit b9b35cb898fedd97ccae91503949ff72b6e22535
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 23:00:47 2020 +0200

    deposit: add another exception

commit b060a9129966a45d7469d6a6eda2d7c506e9dd37
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 23:00:29 2020 +0200

    fix issues noticed during review

commit 139f38d72087b1cd82066b0be1fd801e216b7585
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 12:42:44 2020 +0200

    allow failed deposit as long as they have an swhid

commit 2a71f220716962d63763a2b9ddb2dca0b772f164
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 12:06:21 2020 +0200

    Check deposit status is 'success', and exclude deposit 342 which is failed.

commit 34278dadf8b41f0c522b689f14319bfa451b5ca9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 2 09:21:25 2020 +0200

    Add exception for deposit id 159, which is missing from the deposit DB.

commit 9c0bd38be0616bcdb2c4797f8fcc2dac141d3e47
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 15:13:38 2020 +0200

    drop date from original-artifacts metadata.

commit a9553cd4237da8b79353d2cfe9e7c3bde5e22cb9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 15:09:44 2020 +0200

    add comments

commit b4ab42415fd30cd385a3d2a26846890810b30feb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 12:57:14 2020 +0200

    deposit: shorten test data

commit f5930abf25e21ada2d07deb752ee6a43ea97c900
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 12:53:44 2020 +0200

    deposit tests: add docstrings

commit e3c8eb9d1f0920eb9a279989441def0ac1e24941
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 11:34:28 2020 +0200

    add test for deposit format 1.

commit 381cf1b0212bc1176b93e5f63c5ab49d7b11dad9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:42:09 2020 +0200

    add an origin cache, to spare a request to storage.origin.get for each origin.

commit ff3a5cf1fa8f4e29717b22858028cf2713e943b2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:39:55 2020 +0200

    add origin deposit exceptions.

commit a22803a2618e8b31fe88a04635aacff28c60a9ab
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 1 10:39:39 2020 +0200

    remove prints

commit d9ef8a3eb320b7b9e372d6eaed622b6101df5f4d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 23:28:07 2020 +0200

    deposit: add another exception

commit 7ce577da5330c5f29a89e27afc66c4c5a71ad101
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 23:27:27 2020 +0200

    deposit: add another exception

commit 85de185d4332fc8633ead106fe11d9b37b48c066
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 27 10:27:06 2020 +0200

    npm: unescape package names.

commit 9cb0975a89425403e3fa0e4b1169e423d562f252
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:56:22 2020 +0200

    deposit: use the date of the deposit request for each metadata item, instead of the same date for all.
    
    it's more accurate, and allows storing all versions of the metadata, as we
    can't store metadata twice at the same date.

commit 65d2529d8c7b9d2e09616f253f22277318ab9585
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:40:53 2020 +0200

    deposit: use external_id from the HTTP header instead of the metadata.
    
    The origin URL is created from the one in the Slug header, and the one
    in the metadata file may be wrong.

commit 5f2896029f4dc0665e15655be6a8c60feb983324
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 22:39:53 2020 +0200

    deposit: Change authority type from FORGE to DEPOSIT_CLIENT.

commit 581fbbb0fe0ca8f686b1d97b47a1a29510412c6d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:27:20 2020 +0200

    deposit: fix tests

commit dfd5e8f4c63f08d3141b7daf95a5d97fcfda9a42
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 20:04:29 2020 +0200

    deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.

commit 6435b359d470acab8f1fac52b829203ec90fd69f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:51:27 2020 +0200

    deduplicate origin checking.

commit fe5bc78e88d3799204a96a989ba305d8944cd887
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:50:28 2020 +0200

    deposit: add support for revisions with no metadata.

commit 6cf824488945e451e5491cd6749624b98bb6db96
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 18:48:39 2020 +0200

    deposit: add tests.

commit 5cc3a2c73928b2c59e17b3bfd6571818a212d292
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 26 09:51:06 2020 +0200

    cran: improve package name detection.

commit 2c1cc684b9fad667863d981b9e4f3e0c82903b7c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:51:15 2020 +0200

    Fix crash when original_artifact is missing an url.

commit 7e0dbd00134a5160b23f763fb9084265af806807
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:50:41 2020 +0200

    nixguix: add test.

commit 50e5f682a2a56dd31bbf5011168ebfa0c4dfa78a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:43:09 2020 +0200

    gnu: add tests.

commit 405d26ae54f0f38efa7b77bbd611357c583055e2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:42:17 2020 +0200

    When available, use metadata['when'] as discovery date instead of the revision date.
    
    revision date is when the date was updated, not the date it was seen.

commit e08108a99c453c7a614cfec9f6b599753c22a619
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 16:08:43 2020 +0200

    pypi: get rid of the package name heuristic, it's unreliable.

commit eb45e8c2a7c51cfd931f2f43bef4639b7d673f6d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 15:12:36 2020 +0200

    pypi: add support for another format

commit 3ccdae318af5673444a8b7fbf2596bb490d0d9a7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:54:30 2020 +0200

    pypi: improve origin detection from filename.

commit 50bb6661e994c975d479836d47596e36d1c6e65e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:49:15 2020 +0200

    cran: detect package name when 'provider' does not have the right format.

commit 3e247653ec6bc3be436d7507b84a5db20f552615
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:39:07 2020 +0200

    explicitly error if the loader type could not be detected.

commit a64b35d4b73cbc2a5ec6d6e438e5f560c95374af
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 14:38:38 2020 +0200

    pypi: detect origin from format 2, and fix format of original_artifacts.

commit 5c7fb7e8592f24231e0a8dbe048d31df0e897fe1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:51:20 2020 +0200

    debian: rewrite original_artifacts to the current format.

commit 4d877425b32cd3a17db35adb8e381dddf0853d08
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 13:03:38 2020 +0200

    tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.

commit 74c7c6cefaa4c1903e37c4f32a7fc2f9022b844f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:36:19 2020 +0200

    cran: add support for revisions with date

commit 0e8a56239c158105bec875d5d8db0ed23e7ab78e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:43 2020 +0200

    npm: add tests

commit d54fba8821f5b7a000c420b83cd278e12402e714
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:19:31 2020 +0200

    npm format 2: fix format of original_artifact.

commit 073ebeff4e47559d961b09783204a10e6f1e3f3d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:18:56 2020 +0200

    npm format 2: build origins urls.

commit d2d141e85e87da49692e305beae867448704d68d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 25 12:17:47 2020 +0200

    Rename original-artifact-json to original-artifacts-json.
    
    As in the core loader.

commit 2a2f914d697fe93ae9b3cd9058a2b1f8b6b13714
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:05:09 2020 +0200

    pypi: add test

commit 1ab3a8d8c05adcdc5b1ff1791c66eb6df83aa09b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 17:04:53 2020 +0200

    Start adding the origin context

commit de7cd7e97aa951f786465e9add178a104a841422
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:52 2020 +0200

    cran: add test

commit bb43b24cba109ae8b2262d68c6ad56b3dde8c857
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 16:39:34 2020 +0200

    cran: handle date

commit 7c8ce23fd029d2f408ae27fb20ea4e19fccfdc77
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 24 15:57:44 2020 +0200

    add tests for revisions generated by the debian loader.

commit 688f664b4c5a74dad49241d05605e7278be77b41
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 20 15:24:28 2020 +0200

    [WIP] Add a Python script to migrate extrinsic metadata from revision metadata.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/909/ for more details.

ardumont accepted this revision.Tue, Sep 8, 5:16 PM
This revision is now accepted and ready to land.Tue, Sep 8, 5:16 PM

Build is green

Patch application report for D3885 (id=13747)

Could not rebase; Attempt merge onto 93458a4665...

Updating 93458a46..d24a1e77
Fast-forward
 mypy.ini                                           |    3 +
 requirements.txt                                   |    1 +
 swh/storage/migrate_extrinsic_metadata.py          |  924 ++++++++++++++++
 .../tests/migrate_extrinsic_metadata/test_cran.py  |  221 ++++
 .../migrate_extrinsic_metadata/test_debian.py      |  273 +++++
 .../migrate_extrinsic_metadata/test_deposit.py     | 1167 ++++++++++++++++++++
 .../tests/migrate_extrinsic_metadata/test_gnu.py   |  108 ++
 .../migrate_extrinsic_metadata/test_nixguix.py     |  124 +++
 .../tests/migrate_extrinsic_metadata/test_npm.py   |  376 +++++++
 .../tests/migrate_extrinsic_metadata/test_pypi.py  |  356 ++++++
 10 files changed, 3553 insertions(+)
 create mode 100644 swh/storage/migrate_extrinsic_metadata.py
 create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_cran.py
 create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_debian.py
 create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_deposit.py
 create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_gnu.py
 create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_nixguix.py
 create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_npm.py
 create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_pypi.py
Changes applied before test
commit d24a1e7732acfa87645c2d813eeb9bbf04919f0d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 8 11:56:43 2020 +0200

    migrate_extrinsic_metadata: retry in case of database errors.
    
    They are likely to happen since this script takes a long time to run.

commit 5ec70a6bdae128cfb24bc3419d5a5095923b97bf
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 20 15:24:28 2020 +0200

    Add a Python script to migrate extrinsic metadata from revision metadata.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/912/ for more details.