They are likely to happen since this script takes a long time to run.
Depends on D3820.
Differential D3885
migrate_extrinsic_metadata: retry in case of database errors. vlorentz on Sep 8 2020, 11:57 AM. Authored by Tags None Subscribers None
Details
They are likely to happen since this script takes a long time to run. Depends on D3820.
Diff Detail
Event TimelineComment Actions Build is green Patch application report for D3885 (id=13715)Could not rebase; Attempt merge onto 374e01cf36... Merge made by the 'recursive' strategy. mypy.ini | 3 + requirements.txt | 1 + swh/storage/migrate_extrinsic_metadata.py | 924 ++++++++++++++++ .../tests/migrate_extrinsic_metadata/test_cran.py | 221 ++++ .../migrate_extrinsic_metadata/test_debian.py | 273 +++++ .../migrate_extrinsic_metadata/test_deposit.py | 1167 ++++++++++++++++++++ .../tests/migrate_extrinsic_metadata/test_gnu.py | 108 ++ .../migrate_extrinsic_metadata/test_nixguix.py | 124 +++ .../tests/migrate_extrinsic_metadata/test_npm.py | 376 +++++++ .../tests/migrate_extrinsic_metadata/test_pypi.py | 356 ++++++ 10 files changed, 3553 insertions(+) create mode 100644 swh/storage/migrate_extrinsic_metadata.py create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_cran.py create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_debian.py create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_deposit.py create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_gnu.py create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_nixguix.py create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_npm.py create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_pypi.py Changes applied before testcommit 6890aff4ff9d949daf03deb4180b2abdab50076c Merge: 374e01cf 85425d63 Author: Jenkins user <jenkins@localhost> Date: Tue Sep 8 10:03:32 2020 +0000 Merge branch 'diff-target' into HEAD commit 85425d634f8472009b147358725d89607829f901 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 8 11:56:43 2020 +0200 migrate_extrinsic_metadata: retry in case of database errors. They are likely to happen since this script takes a long time to run. commit 99792936ddf9e31796018ce364825636a5958857 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 8 11:55:04 2020 +0200 don't crash when CRAN origins are missing. commit 4194c092e74990f1db565c5ad8198cc83ca916ad Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 8 11:53:50 2020 +0200 add remaining deposit edge cases commit ead1fb7845e10ac4209dbba5b36e577d000701bc Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Sep 4 12:32:10 2020 +0200 pypi: add comments on tests commit b9b35cb898fedd97ccae91503949ff72b6e22535 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Wed Sep 2 23:00:47 2020 +0200 deposit: add another exception commit b060a9129966a45d7469d6a6eda2d7c506e9dd37 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Wed Sep 2 23:00:29 2020 +0200 fix issues noticed during review commit 139f38d72087b1cd82066b0be1fd801e216b7585 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Wed Sep 2 12:42:44 2020 +0200 allow failed deposit as long as they have an swhid commit 2a71f220716962d63763a2b9ddb2dca0b772f164 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Wed Sep 2 12:06:21 2020 +0200 Check deposit status is 'success', and exclude deposit 342 which is failed. commit 34278dadf8b41f0c522b689f14319bfa451b5ca9 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Wed Sep 2 09:21:25 2020 +0200 Add exception for deposit id 159, which is missing from the deposit DB. commit 9c0bd38be0616bcdb2c4797f8fcc2dac141d3e47 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 1 15:13:38 2020 +0200 drop date from original-artifacts metadata. commit a9553cd4237da8b79353d2cfe9e7c3bde5e22cb9 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 1 15:09:44 2020 +0200 add comments commit b4ab42415fd30cd385a3d2a26846890810b30feb Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 1 12:57:14 2020 +0200 deposit: shorten test data commit f5930abf25e21ada2d07deb752ee6a43ea97c900 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 1 12:53:44 2020 +0200 deposit tests: add docstrings commit e3c8eb9d1f0920eb9a279989441def0ac1e24941 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 1 11:34:28 2020 +0200 add test for deposit format 1. commit 381cf1b0212bc1176b93e5f63c5ab49d7b11dad9 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 1 10:42:09 2020 +0200 add an origin cache, to spare a request to storage.origin.get for each origin. commit ff3a5cf1fa8f4e29717b22858028cf2713e943b2 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 1 10:39:55 2020 +0200 add origin deposit exceptions. commit a22803a2618e8b31fe88a04635aacff28c60a9ab Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 1 10:39:39 2020 +0200 remove prints commit d9ef8a3eb320b7b9e372d6eaed622b6101df5f4d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 27 23:28:07 2020 +0200 deposit: add another exception commit 7ce577da5330c5f29a89e27afc66c4c5a71ad101 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 27 23:27:27 2020 +0200 deposit: add another exception commit 85de185d4332fc8633ead106fe11d9b37b48c066 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 27 10:27:06 2020 +0200 npm: unescape package names. commit 9cb0975a89425403e3fa0e4b1169e423d562f252 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Wed Aug 26 22:56:22 2020 +0200 deposit: use the date of the deposit request for each metadata item, instead of the same date for all. it's more accurate, and allows storing all versions of the metadata, as we can't store metadata twice at the same date. commit 65d2529d8c7b9d2e09616f253f22277318ab9585 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Wed Aug 26 22:40:53 2020 +0200 deposit: use external_id from the HTTP header instead of the metadata. The origin URL is created from the one in the Slug header, and the one in the metadata file may be wrong. commit 5f2896029f4dc0665e15655be6a8c60feb983324 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Wed Aug 26 22:39:53 2020 +0200 deposit: Change authority type from FORGE to DEPOSIT_CLIENT. commit 581fbbb0fe0ca8f686b1d97b47a1a29510412c6d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Wed Aug 26 20:27:20 2020 +0200 deposit: fix tests commit dfd5e8f4c63f08d3141b7daf95a5d97fcfda9a42 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Wed Aug 26 20:04:29 2020 +0200 deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves. commit 6435b359d470acab8f1fac52b829203ec90fd69f Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Wed Aug 26 18:51:27 2020 +0200 deduplicate origin checking. commit fe5bc78e88d3799204a96a989ba305d8944cd887 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Wed Aug 26 18:50:28 2020 +0200 deposit: add support for revisions with no metadata. commit 6cf824488945e451e5491cd6749624b98bb6db96 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Wed Aug 26 18:48:39 2020 +0200 deposit: add tests. commit 5cc3a2c73928b2c59e17b3bfd6571818a212d292 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Wed Aug 26 09:51:06 2020 +0200 cran: improve package name detection. commit 2c1cc684b9fad667863d981b9e4f3e0c82903b7c Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Aug 25 16:51:15 2020 +0200 Fix crash when original_artifact is missing an url. commit 7e0dbd00134a5160b23f763fb9084265af806807 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Aug 25 16:50:41 2020 +0200 nixguix: add test. commit 50e5f682a2a56dd31bbf5011168ebfa0c4dfa78a Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Aug 25 16:43:09 2020 +0200 gnu: add tests. commit 405d26ae54f0f38efa7b77bbd611357c583055e2 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Aug 25 16:42:17 2020 +0200 When available, use metadata['when'] as discovery date instead of the revision date. revision date is when the date was updated, not the date it was seen. commit e08108a99c453c7a614cfec9f6b599753c22a619 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Aug 25 16:08:43 2020 +0200 pypi: get rid of the package name heuristic, it's unreliable. commit eb45e8c2a7c51cfd931f2f43bef4639b7d673f6d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Aug 25 15:12:36 2020 +0200 pypi: add support for another format commit 3ccdae318af5673444a8b7fbf2596bb490d0d9a7 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Aug 25 14:54:30 2020 +0200 pypi: improve origin detection from filename. commit 50bb6661e994c975d479836d47596e36d1c6e65e Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Aug 25 14:49:15 2020 +0200 cran: detect package name when 'provider' does not have the right format. commit 3e247653ec6bc3be436d7507b84a5db20f552615 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Aug 25 14:39:07 2020 +0200 explicitly error if the loader type could not be detected. commit a64b35d4b73cbc2a5ec6d6e438e5f560c95374af Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Aug 25 14:38:38 2020 +0200 pypi: detect origin from format 2, and fix format of original_artifacts. commit 5c7fb7e8592f24231e0a8dbe048d31df0e897fe1 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Aug 25 13:51:20 2020 +0200 debian: rewrite original_artifacts to the current format. commit 4d877425b32cd3a17db35adb8e381dddf0853d08 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Aug 25 13:03:38 2020 +0200 tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data. commit 74c7c6cefaa4c1903e37c4f32a7fc2f9022b844f Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Aug 25 12:36:19 2020 +0200 cran: add support for revisions with date commit 0e8a56239c158105bec875d5d8db0ed23e7ab78e Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Aug 25 12:19:43 2020 +0200 npm: add tests commit d54fba8821f5b7a000c420b83cd278e12402e714 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Aug 25 12:19:31 2020 +0200 npm format 2: fix format of original_artifact. commit 073ebeff4e47559d961b09783204a10e6f1e3f3d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Aug 25 12:18:56 2020 +0200 npm format 2: build origins urls. commit d2d141e85e87da49692e305beae867448704d68d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Aug 25 12:17:47 2020 +0200 Rename original-artifact-json to original-artifacts-json. As in the core loader. commit 2a2f914d697fe93ae9b3cd9058a2b1f8b6b13714 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon Aug 24 17:05:09 2020 +0200 pypi: add test commit 1ab3a8d8c05adcdc5b1ff1791c66eb6df83aa09b Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon Aug 24 17:04:53 2020 +0200 Start adding the origin context commit de7cd7e97aa951f786465e9add178a104a841422 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon Aug 24 16:39:52 2020 +0200 cran: add test commit bb43b24cba109ae8b2262d68c6ad56b3dde8c857 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon Aug 24 16:39:34 2020 +0200 cran: handle date commit 7c8ce23fd029d2f408ae27fb20ea4e19fccfdc77 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon Aug 24 15:57:44 2020 +0200 add tests for revisions generated by the debian loader. commit 688f664b4c5a74dad49241d05605e7278be77b41 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 20 15:24:28 2020 +0200 [WIP] Add a Python script to migrate extrinsic metadata from revision metadata. See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/909/ for more details. Comment Actions Build is green Patch application report for D3885 (id=13747)Could not rebase; Attempt merge onto 93458a4665... Updating 93458a46..d24a1e77 Fast-forward mypy.ini | 3 + requirements.txt | 1 + swh/storage/migrate_extrinsic_metadata.py | 924 ++++++++++++++++ .../tests/migrate_extrinsic_metadata/test_cran.py | 221 ++++ .../migrate_extrinsic_metadata/test_debian.py | 273 +++++ .../migrate_extrinsic_metadata/test_deposit.py | 1167 ++++++++++++++++++++ .../tests/migrate_extrinsic_metadata/test_gnu.py | 108 ++ .../migrate_extrinsic_metadata/test_nixguix.py | 124 +++ .../tests/migrate_extrinsic_metadata/test_npm.py | 376 +++++++ .../tests/migrate_extrinsic_metadata/test_pypi.py | 356 ++++++ 10 files changed, 3553 insertions(+) create mode 100644 swh/storage/migrate_extrinsic_metadata.py create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_cran.py create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_debian.py create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_deposit.py create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_gnu.py create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_nixguix.py create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_npm.py create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_pypi.py Changes applied before testcommit d24a1e7732acfa87645c2d813eeb9bbf04919f0d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 8 11:56:43 2020 +0200 migrate_extrinsic_metadata: retry in case of database errors. They are likely to happen since this script takes a long time to run. commit 5ec70a6bdae128cfb24bc3419d5a5095923b97bf Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 20 15:24:28 2020 +0200 Add a Python script to migrate extrinsic metadata from revision metadata. See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/912/ for more details. |