Page MenuHomeSoftware Heritage

migrate_extrinsic_metadata: don't crash when deb revisions aren't referenced by any snapshot
ClosedPublic

Authored by vlorentz on Nov 6 2020, 12:59 PM.

Details

Summary

As this happens for about 50 revisions in the archive.

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build was aborted

Patch application report for D4438 (id=15705)

Could not rebase; Attempt merge onto 84984a600c...

Updating 84984a60..e206d848
Fast-forward
 requirements-swh-journal.txt                       |   2 +-
 swh/storage/backfill.py                            |  10 +-
 swh/storage/migrate_extrinsic_metadata.py          |  51 +-
 swh/storage/retry.py                               |  80 +--
 .../tests/migrate_extrinsic_metadata/test_pypi.py  |   2 +
 swh/storage/tests/test_retry.py                    | 640 ---------------------
 swh/storage/writer.py                              |  21 +-
 7 files changed, 91 insertions(+), 715 deletions(-)
Changes applied before test
commit e206d848812a4fe395e0702fcd2b2a1b966981be
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:59:15 2020 +0100

    migrate_extrinsic_metadata: don't crash when deb revisions aren't referenced by any snapshot
    
    As this happens for about 50 revisions in the archive.

commit c83c9020f0e6d6ef634972c5e85b2345e00118cc
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:57:02 2020 +0100

    migrate_extrinsic_metadata: Remove log output when a CRAN origin is missing
    
    as this happens quite often and isn't an error.

commit 0c8000145778d1774fede5241b0d6dbf28d9aecb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:55:58 2020 +0100

    migrate_extrinsic_metadata: add support for guessing the origin of more PyPI packages from filenames.

commit fc46b7f277603d66c4ae80294cdd4e3d32125949
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:54:14 2020 +0100

    migrate_extrinsic_metadata: use the retry proxy
    
    Because it makes a lot of get requests and doesn't handle failures,
    it crashed often.

commit 5794e0403f2ad69c9ccd98733bceee908d1bf21e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:52:50 2020 +0100

    Make the retry proxy work on all functions.
    
    The metadata migration script kept crashing otherwise.

commit 9d1237b4eae1d64ca784e3ebb7f0f390e551b3a6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 5 15:28:49 2020 +0100

    Set the value_sanitizer argument of get_journal_writer.
    
    The next version of swh-journal will remove the default value.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1056/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1056/console

Build is green

Patch application report for D4438 (id=15705)

Could not rebase; Attempt merge onto 84984a600c...

Updating 84984a60..e206d848
Fast-forward
 requirements-swh-journal.txt                       |   2 +-
 swh/storage/backfill.py                            |  10 +-
 swh/storage/migrate_extrinsic_metadata.py          |  51 +-
 swh/storage/retry.py                               |  80 +--
 .../tests/migrate_extrinsic_metadata/test_pypi.py  |   2 +
 swh/storage/tests/test_retry.py                    | 640 ---------------------
 swh/storage/writer.py                              |  21 +-
 7 files changed, 91 insertions(+), 715 deletions(-)
Changes applied before test
commit e206d848812a4fe395e0702fcd2b2a1b966981be
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:59:15 2020 +0100

    migrate_extrinsic_metadata: don't crash when deb revisions aren't referenced by any snapshot
    
    As this happens for about 50 revisions in the archive.

commit c83c9020f0e6d6ef634972c5e85b2345e00118cc
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:57:02 2020 +0100

    migrate_extrinsic_metadata: Remove log output when a CRAN origin is missing
    
    as this happens quite often and isn't an error.

commit 0c8000145778d1774fede5241b0d6dbf28d9aecb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:55:58 2020 +0100

    migrate_extrinsic_metadata: add support for guessing the origin of more PyPI packages from filenames.

commit fc46b7f277603d66c4ae80294cdd4e3d32125949
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:54:14 2020 +0100

    migrate_extrinsic_metadata: use the retry proxy
    
    Because it makes a lot of get requests and doesn't handle failures,
    it crashed often.

commit 5794e0403f2ad69c9ccd98733bceee908d1bf21e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:52:50 2020 +0100

    Make the retry proxy work on all functions.
    
    The metadata migration script kept crashing otherwise.

commit 9d1237b4eae1d64ca784e3ebb7f0f390e551b3a6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 5 15:28:49 2020 +0100

    Set the value_sanitizer argument of get_journal_writer.
    
    The next version of swh-journal will remove the default value.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1057/ for more details.

Build is green

Patch application report for D4438 (id=15757)

Could not rebase; Attempt merge onto 24cdc85c15...

Updating 24cdc85c..387e2870
Fast-forward
 requirements-swh-journal.txt                       |   2 +-
 swh/storage/backfill.py                            |  10 +-
 swh/storage/migrate_extrinsic_metadata.py          |  51 +-
 swh/storage/retry.py                               |  80 +--
 .../tests/migrate_extrinsic_metadata/test_pypi.py  |   2 +
 swh/storage/tests/test_retry.py                    | 640 ---------------------
 swh/storage/writer.py                              |  21 +-
 7 files changed, 91 insertions(+), 715 deletions(-)
Changes applied before test
commit 387e28706b2d43fcacfeeb2eb7e3cdfce766c3c9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:59:15 2020 +0100

    migrate_extrinsic_metadata: don't crash when deb revisions aren't referenced by any snapshot
    
    As this happens for about 50 revisions in the archive.

commit 3eba73df9f38beb5e35e35c5d56d32ae87372184
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:57:02 2020 +0100

    migrate_extrinsic_metadata: Remove log output when a CRAN origin is missing
    
    as this happens quite often and isn't an error.

commit f3652a97118c1bbfea6328ec1d9f913941b288f9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:55:58 2020 +0100

    migrate_extrinsic_metadata: add support for guessing the origin of more PyPI packages from filenames.

commit c0a3d966faf45f4623c7e1bc03f5fa72024d22cf
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:54:14 2020 +0100

    migrate_extrinsic_metadata: use the retry proxy
    
    Because it makes a lot of get requests and doesn't handle failures,
    it crashed often.

commit aded45b9b27d93df8195fde5f462250e2e84798b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:52:50 2020 +0100

    Make the retry proxy work on all functions.
    
    The metadata migration script kept crashing otherwise.

commit 2e7d489eb245d4c3b684360d6f70f061804f77a9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 5 15:28:49 2020 +0100

    Set the value_sanitizer argument of get_journal_writer.
    
    The next version of swh-journal will remove the default value.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1064/ for more details.

As said on IRC, I think I'd rather prefer erroneous hashed to be logged somewhere rather than using an assertion.

Build is green

Patch application report for D4438 (id=15793)

Rebasing onto 3eba73df9f...

Current branch diff-target is up to date.
Changes applied before test
commit c63b88103042384f55050c8bd29cb661fbfc9100
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 6 12:59:15 2020 +0100

    migrate_extrinsic_metadata: don't crash when deb revisions aren't referenced by any snapshot
    
    As this happens for about 50 revisions in the archive.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1065/ for more details.

This revision is now accepted and ready to land.Nov 13 2020, 10:23 AM

Build is green

Patch application report for D4438 (id=15869)

Rebasing onto f5011362fb...

First, rewinding head to replay your work on top of it...
Fast-forwarded diff-target to base-revision-1067-D4438.
Changes applied before test

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1067/ for more details.