Page MenuHomeSoftware Heritage

migrate_extrinsic_metadata: don't crash when deb revisions aren't referenced by any snapshot
ClosedPublic

Authored by vlorentz on Nov 6 2020, 12:59 PM.

Details

Summary

As this happens for about 50 revisions in the archive.

Diff Detail

Repository
rDSTO Storage manager
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 17004
Build 26241: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 26240: arc lint + arc unit

Event Timeline

Build was aborted

Patch application report for D4438 (id=15705)

Could not rebase; Attempt merge onto 84984a600c...

Updating 84984a60..e206d848
Fast-forward
 requirements-swh-journal.txt                       |   2 +-
 swh/storage/backfill.py                            |  10 +-
 swh/storage/migrate_extrinsic_metadata.py          |  51 +-
 swh/storage/retry.py                               |  80 +--
 .../tests/migrate_extrinsic_metadata/test_pypi.py  |   2 +
 swh/storage/tests/test_retry.py                    | 640 ---------------------
 swh/storage/writer.py                              |  21 +-
 7 files changed, 91 insertions(+), 715 deletions(-)
Changes applied before test
commit e206d848812a4fe395e0702fcd2b2a1b966981be
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:59:15 2020 +0100

    migrate_extrinsic_metadata: don't crash when deb revisions aren't referenced by any snapshot
    
    As this happens for about 50 revisions in the archive.

commit c83c9020f0e6d6ef634972c5e85b2345e00118cc
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:57:02 2020 +0100

    migrate_extrinsic_metadata: Remove log output when a CRAN origin is missing
    
    as this happens quite often and isn't an error.

commit 0c8000145778d1774fede5241b0d6dbf28d9aecb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:55:58 2020 +0100

    migrate_extrinsic_metadata: add support for guessing the origin of more PyPI packages from filenames.

commit fc46b7f277603d66c4ae80294cdd4e3d32125949
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:54:14 2020 +0100

    migrate_extrinsic_metadata: use the retry proxy
    
    Because it makes a lot of get requests and doesn't handle failures,
    it crashed often.

commit 5794e0403f2ad69c9ccd98733bceee908d1bf21e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:52:50 2020 +0100

    Make the retry proxy work on all functions.
    
    The metadata migration script kept crashing otherwise.

commit 9d1237b4eae1d64ca784e3ebb7f0f390e551b3a6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 5 15:28:49 2020 +0100

    Set the value_sanitizer argument of get_journal_writer.
    
    The next version of swh-journal will remove the default value.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1056/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1056/console

Build is green

Patch application report for D4438 (id=15705)

Could not rebase; Attempt merge onto 84984a600c...

Updating 84984a60..e206d848
Fast-forward
 requirements-swh-journal.txt                       |   2 +-
 swh/storage/backfill.py                            |  10 +-
 swh/storage/migrate_extrinsic_metadata.py          |  51 +-
 swh/storage/retry.py                               |  80 +--
 .../tests/migrate_extrinsic_metadata/test_pypi.py  |   2 +
 swh/storage/tests/test_retry.py                    | 640 ---------------------
 swh/storage/writer.py                              |  21 +-
 7 files changed, 91 insertions(+), 715 deletions(-)
Changes applied before test
commit e206d848812a4fe395e0702fcd2b2a1b966981be
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:59:15 2020 +0100

    migrate_extrinsic_metadata: don't crash when deb revisions aren't referenced by any snapshot
    
    As this happens for about 50 revisions in the archive.

commit c83c9020f0e6d6ef634972c5e85b2345e00118cc
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:57:02 2020 +0100

    migrate_extrinsic_metadata: Remove log output when a CRAN origin is missing
    
    as this happens quite often and isn't an error.

commit 0c8000145778d1774fede5241b0d6dbf28d9aecb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:55:58 2020 +0100

    migrate_extrinsic_metadata: add support for guessing the origin of more PyPI packages from filenames.

commit fc46b7f277603d66c4ae80294cdd4e3d32125949
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:54:14 2020 +0100

    migrate_extrinsic_metadata: use the retry proxy
    
    Because it makes a lot of get requests and doesn't handle failures,
    it crashed often.

commit 5794e0403f2ad69c9ccd98733bceee908d1bf21e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:52:50 2020 +0100

    Make the retry proxy work on all functions.
    
    The metadata migration script kept crashing otherwise.

commit 9d1237b4eae1d64ca784e3ebb7f0f390e551b3a6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 5 15:28:49 2020 +0100

    Set the value_sanitizer argument of get_journal_writer.
    
    The next version of swh-journal will remove the default value.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1057/ for more details.

Build is green

Patch application report for D4438 (id=15757)

Could not rebase; Attempt merge onto 24cdc85c15...

Updating 24cdc85c..387e2870
Fast-forward
 requirements-swh-journal.txt                       |   2 +-
 swh/storage/backfill.py                            |  10 +-
 swh/storage/migrate_extrinsic_metadata.py          |  51 +-
 swh/storage/retry.py                               |  80 +--
 .../tests/migrate_extrinsic_metadata/test_pypi.py  |   2 +
 swh/storage/tests/test_retry.py                    | 640 ---------------------
 swh/storage/writer.py                              |  21 +-
 7 files changed, 91 insertions(+), 715 deletions(-)
Changes applied before test
commit 387e28706b2d43fcacfeeb2eb7e3cdfce766c3c9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:59:15 2020 +0100

    migrate_extrinsic_metadata: don't crash when deb revisions aren't referenced by any snapshot
    
    As this happens for about 50 revisions in the archive.

commit 3eba73df9f38beb5e35e35c5d56d32ae87372184
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:57:02 2020 +0100

    migrate_extrinsic_metadata: Remove log output when a CRAN origin is missing
    
    as this happens quite often and isn't an error.

commit f3652a97118c1bbfea6328ec1d9f913941b288f9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:55:58 2020 +0100

    migrate_extrinsic_metadata: add support for guessing the origin of more PyPI packages from filenames.

commit c0a3d966faf45f4623c7e1bc03f5fa72024d22cf
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:54:14 2020 +0100

    migrate_extrinsic_metadata: use the retry proxy
    
    Because it makes a lot of get requests and doesn't handle failures,
    it crashed often.

commit aded45b9b27d93df8195fde5f462250e2e84798b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:52:50 2020 +0100

    Make the retry proxy work on all functions.
    
    The metadata migration script kept crashing otherwise.

commit 2e7d489eb245d4c3b684360d6f70f061804f77a9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 5 15:28:49 2020 +0100

    Set the value_sanitizer argument of get_journal_writer.
    
    The next version of swh-journal will remove the default value.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1064/ for more details.

As said on IRC, I think I'd rather prefer erroneous hashed to be logged somewhere rather than using an assertion.

Build is green

Patch application report for D4438 (id=15793)

Rebasing onto 3eba73df9f...

Current branch diff-target is up to date.
Changes applied before test
commit c63b88103042384f55050c8bd29cb661fbfc9100
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:59:15 2020 +0100

    migrate_extrinsic_metadata: don't crash when deb revisions aren't referenced by any snapshot
    
    As this happens for about 50 revisions in the archive.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1065/ for more details.

This revision is now accepted and ready to land.Nov 13 2020, 10:23 AM

Build is green

Patch application report for D4438 (id=15869)

Rebasing onto f5011362fb...

First, rewinding head to replay your work on top of it...
Fast-forwarded diff-target to base-revision-1067-D4438.
Changes applied before test

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1067/ for more details.