They are likely to happen since this script takes a long time to run.
Depends on D3820.
Differential D3885
migrate_extrinsic_metadata: retry in case of database errors. Authored by vlorentz on Sep 8 2020, 11:57 AM. Tags None Subscribers None
Details
They are likely to happen since this script takes a long time to run. Depends on D3820.
Diff Detail
Event TimelineComment Actions Build is green Patch application report for D3885 (id=13715)Could not rebase; Attempt merge onto 374e01cf36... Merge made by the 'recursive' strategy. mypy.ini | 3 + requirements.txt | 1 + swh/storage/migrate_extrinsic_metadata.py | 924 ++++++++++++++++ .../tests/migrate_extrinsic_metadata/test_cran.py | 221 ++++ .../migrate_extrinsic_metadata/test_debian.py | 273 +++++ .../migrate_extrinsic_metadata/test_deposit.py | 1167 ++++++++++++++++++++ .../tests/migrate_extrinsic_metadata/test_gnu.py | 108 ++ .../migrate_extrinsic_metadata/test_nixguix.py | 124 +++ .../tests/migrate_extrinsic_metadata/test_npm.py | 376 +++++++ .../tests/migrate_extrinsic_metadata/test_pypi.py | 356 ++++++ 10 files changed, 3553 insertions(+) create mode 100644 swh/storage/migrate_extrinsic_metadata.py create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_cran.py create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_debian.py create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_deposit.py create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_gnu.py create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_nixguix.py create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_npm.py create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_pypi.py Changes applied before testcommit 6890aff4ff9d949daf03deb4180b2abdab50076c
Merge: 374e01cf 85425d63
Author: Jenkins user <jenkins@localhost>
Date: Tue Sep 8 10:03:32 2020 +0000
Merge branch 'diff-target' into HEAD
commit 85425d634f8472009b147358725d89607829f901
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 8 11:56:43 2020 +0200
migrate_extrinsic_metadata: retry in case of database errors.
They are likely to happen since this script takes a long time to run.
commit 99792936ddf9e31796018ce364825636a5958857
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 8 11:55:04 2020 +0200
don't crash when CRAN origins are missing.
commit 4194c092e74990f1db565c5ad8198cc83ca916ad
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 8 11:53:50 2020 +0200
add remaining deposit edge cases
commit ead1fb7845e10ac4209dbba5b36e577d000701bc
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Fri Sep 4 12:32:10 2020 +0200
pypi: add comments on tests
commit b9b35cb898fedd97ccae91503949ff72b6e22535
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Wed Sep 2 23:00:47 2020 +0200
deposit: add another exception
commit b060a9129966a45d7469d6a6eda2d7c506e9dd37
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Wed Sep 2 23:00:29 2020 +0200
fix issues noticed during review
commit 139f38d72087b1cd82066b0be1fd801e216b7585
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Wed Sep 2 12:42:44 2020 +0200
allow failed deposit as long as they have an swhid
commit 2a71f220716962d63763a2b9ddb2dca0b772f164
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Wed Sep 2 12:06:21 2020 +0200
Check deposit status is 'success', and exclude deposit 342 which is failed.
commit 34278dadf8b41f0c522b689f14319bfa451b5ca9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Wed Sep 2 09:21:25 2020 +0200
Add exception for deposit id 159, which is missing from the deposit DB.
commit 9c0bd38be0616bcdb2c4797f8fcc2dac141d3e47
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 1 15:13:38 2020 +0200
drop date from original-artifacts metadata.
commit a9553cd4237da8b79353d2cfe9e7c3bde5e22cb9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 1 15:09:44 2020 +0200
add comments
commit b4ab42415fd30cd385a3d2a26846890810b30feb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 1 12:57:14 2020 +0200
deposit: shorten test data
commit f5930abf25e21ada2d07deb752ee6a43ea97c900
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 1 12:53:44 2020 +0200
deposit tests: add docstrings
commit e3c8eb9d1f0920eb9a279989441def0ac1e24941
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 1 11:34:28 2020 +0200
add test for deposit format 1.
commit 381cf1b0212bc1176b93e5f63c5ab49d7b11dad9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 1 10:42:09 2020 +0200
add an origin cache, to spare a request to storage.origin.get for each origin.
commit ff3a5cf1fa8f4e29717b22858028cf2713e943b2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 1 10:39:55 2020 +0200
add origin deposit exceptions.
commit a22803a2618e8b31fe88a04635aacff28c60a9ab
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 1 10:39:39 2020 +0200
remove prints
commit d9ef8a3eb320b7b9e372d6eaed622b6101df5f4d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Aug 27 23:28:07 2020 +0200
deposit: add another exception
commit 7ce577da5330c5f29a89e27afc66c4c5a71ad101
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Aug 27 23:27:27 2020 +0200
deposit: add another exception
commit 85de185d4332fc8633ead106fe11d9b37b48c066
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Aug 27 10:27:06 2020 +0200
npm: unescape package names.
commit 9cb0975a89425403e3fa0e4b1169e423d562f252
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Wed Aug 26 22:56:22 2020 +0200
deposit: use the date of the deposit request for each metadata item, instead of the same date for all.
it's more accurate, and allows storing all versions of the metadata, as we
can't store metadata twice at the same date.
commit 65d2529d8c7b9d2e09616f253f22277318ab9585
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Wed Aug 26 22:40:53 2020 +0200
deposit: use external_id from the HTTP header instead of the metadata.
The origin URL is created from the one in the Slug header, and the one
in the metadata file may be wrong.
commit 5f2896029f4dc0665e15655be6a8c60feb983324
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Wed Aug 26 22:39:53 2020 +0200
deposit: Change authority type from FORGE to DEPOSIT_CLIENT.
commit 581fbbb0fe0ca8f686b1d97b47a1a29510412c6d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Wed Aug 26 20:27:20 2020 +0200
deposit: fix tests
commit dfd5e8f4c63f08d3141b7daf95a5d97fcfda9a42
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Wed Aug 26 20:04:29 2020 +0200
deposit: add test / fix support for revisions without the provider_url key (format 5), ie. when we have to compute the origin ourselves.
commit 6435b359d470acab8f1fac52b829203ec90fd69f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Wed Aug 26 18:51:27 2020 +0200
deduplicate origin checking.
commit fe5bc78e88d3799204a96a989ba305d8944cd887
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Wed Aug 26 18:50:28 2020 +0200
deposit: add support for revisions with no metadata.
commit 6cf824488945e451e5491cd6749624b98bb6db96
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Wed Aug 26 18:48:39 2020 +0200
deposit: add tests.
commit 5cc3a2c73928b2c59e17b3bfd6571818a212d292
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Wed Aug 26 09:51:06 2020 +0200
cran: improve package name detection.
commit 2c1cc684b9fad667863d981b9e4f3e0c82903b7c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Aug 25 16:51:15 2020 +0200
Fix crash when original_artifact is missing an url.
commit 7e0dbd00134a5160b23f763fb9084265af806807
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Aug 25 16:50:41 2020 +0200
nixguix: add test.
commit 50e5f682a2a56dd31bbf5011168ebfa0c4dfa78a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Aug 25 16:43:09 2020 +0200
gnu: add tests.
commit 405d26ae54f0f38efa7b77bbd611357c583055e2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Aug 25 16:42:17 2020 +0200
When available, use metadata['when'] as discovery date instead of the revision date.
revision date is when the date was updated, not the date it was seen.
commit e08108a99c453c7a614cfec9f6b599753c22a619
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Aug 25 16:08:43 2020 +0200
pypi: get rid of the package name heuristic, it's unreliable.
commit eb45e8c2a7c51cfd931f2f43bef4639b7d673f6d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Aug 25 15:12:36 2020 +0200
pypi: add support for another format
commit 3ccdae318af5673444a8b7fbf2596bb490d0d9a7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Aug 25 14:54:30 2020 +0200
pypi: improve origin detection from filename.
commit 50bb6661e994c975d479836d47596e36d1c6e65e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Aug 25 14:49:15 2020 +0200
cran: detect package name when 'provider' does not have the right format.
commit 3e247653ec6bc3be436d7507b84a5db20f552615
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Aug 25 14:39:07 2020 +0200
explicitly error if the loader type could not be detected.
commit a64b35d4b73cbc2a5ec6d6e438e5f560c95374af
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Aug 25 14:38:38 2020 +0200
pypi: detect origin from format 2, and fix format of original_artifacts.
commit 5c7fb7e8592f24231e0a8dbe048d31df0e897fe1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Aug 25 13:51:20 2020 +0200
debian: rewrite original_artifacts to the current format.
commit 4d877425b32cd3a17db35adb8e381dddf0853d08
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Aug 25 13:03:38 2020 +0200
tests: deepcopy the row before passing it, to prevent handle_row from mutating the test data.
commit 74c7c6cefaa4c1903e37c4f32a7fc2f9022b844f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Aug 25 12:36:19 2020 +0200
cran: add support for revisions with date
commit 0e8a56239c158105bec875d5d8db0ed23e7ab78e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Aug 25 12:19:43 2020 +0200
npm: add tests
commit d54fba8821f5b7a000c420b83cd278e12402e714
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Aug 25 12:19:31 2020 +0200
npm format 2: fix format of original_artifact.
commit 073ebeff4e47559d961b09783204a10e6f1e3f3d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Aug 25 12:18:56 2020 +0200
npm format 2: build origins urls.
commit d2d141e85e87da49692e305beae867448704d68d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Aug 25 12:17:47 2020 +0200
Rename original-artifact-json to original-artifacts-json.
As in the core loader.
commit 2a2f914d697fe93ae9b3cd9058a2b1f8b6b13714
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon Aug 24 17:05:09 2020 +0200
pypi: add test
commit 1ab3a8d8c05adcdc5b1ff1791c66eb6df83aa09b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon Aug 24 17:04:53 2020 +0200
Start adding the origin context
commit de7cd7e97aa951f786465e9add178a104a841422
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon Aug 24 16:39:52 2020 +0200
cran: add test
commit bb43b24cba109ae8b2262d68c6ad56b3dde8c857
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon Aug 24 16:39:34 2020 +0200
cran: handle date
commit 7c8ce23fd029d2f408ae27fb20ea4e19fccfdc77
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon Aug 24 15:57:44 2020 +0200
add tests for revisions generated by the debian loader.
commit 688f664b4c5a74dad49241d05605e7278be77b41
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Aug 20 15:24:28 2020 +0200
[WIP] Add a Python script to migrate extrinsic metadata from revision metadata.See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/909/ for more details. Comment Actions Build is green Patch application report for D3885 (id=13747)Could not rebase; Attempt merge onto 93458a4665... Updating 93458a46..d24a1e77 Fast-forward mypy.ini | 3 + requirements.txt | 1 + swh/storage/migrate_extrinsic_metadata.py | 924 ++++++++++++++++ .../tests/migrate_extrinsic_metadata/test_cran.py | 221 ++++ .../migrate_extrinsic_metadata/test_debian.py | 273 +++++ .../migrate_extrinsic_metadata/test_deposit.py | 1167 ++++++++++++++++++++ .../tests/migrate_extrinsic_metadata/test_gnu.py | 108 ++ .../migrate_extrinsic_metadata/test_nixguix.py | 124 +++ .../tests/migrate_extrinsic_metadata/test_npm.py | 376 +++++++ .../tests/migrate_extrinsic_metadata/test_pypi.py | 356 ++++++ 10 files changed, 3553 insertions(+) create mode 100644 swh/storage/migrate_extrinsic_metadata.py create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_cran.py create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_debian.py create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_deposit.py create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_gnu.py create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_nixguix.py create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_npm.py create mode 100644 swh/storage/tests/migrate_extrinsic_metadata/test_pypi.py Changes applied before testcommit d24a1e7732acfa87645c2d813eeb9bbf04919f0d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 8 11:56:43 2020 +0200
migrate_extrinsic_metadata: retry in case of database errors.
They are likely to happen since this script takes a long time to run.
commit 5ec70a6bdae128cfb24bc3419d5a5095923b97bf
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Aug 20 15:24:28 2020 +0200
Add a Python script to migrate extrinsic metadata from revision metadata.See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/912/ for more details. |