Page MenuHomeSoftware Heritage

Handle more cases of corruption
ClosedPublic

Authored by Alphare on Apr 27 2021, 5:31 PM.

Details

Summary

Some corrupted repos have missing files or broken logical links in the
underlying Mercurial datastructure, which means that say sometimes fail
for a given revision. This does not mean we should throw away the rest
of the repository. (Tested on repos of various levels and flavors of
corruption in the Boatbucket archive)

Diff Detail

Repository
rDLDHG Mercurial loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D5627 (id=20067)

Could not rebase; Attempt merge onto f03f274065...

Updating f03f274..5413622
Fast-forward
 swh/loader/mercurial/from_disk.py | 27 +++++++++++++++++++++++----
 swh/loader/mercurial/hgutil.py    |  3 ++-
 swh/loader/mercurial/utils.py     |  3 ++-
 3 files changed, 27 insertions(+), 6 deletions(-)
Changes applied before test
commit 5413622cd6cf1a584d1b9d300b3dd1f5a6e94ce6
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Mon Apr 26 23:33:20 2021 +0200

    Handle more cases of corruption
    
    Some corrupted repos have missing files or broken logical links in the
    underlying Mercurial datastructure, which means that say sometimes fail
    for a given revision. This does not mean we should throw away the rest
    of the repository. (Tested on repos of various levels and flavors of
    corruption in the Boatbucket archive)

commit c6c3b386ef246860e9292ab0331aaf32cf72d61b
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Mon Apr 26 23:28:50 2021 +0200

    Ignore the repository's config
    
    `HGRCPATH` only tells Mercurial to ignore the user's config files, but
    some repositories have a `.hg/hgrc` file (only in the case that you copy
    the files instead of cloning, if present) that is usually used for server-side
    configuration. We want to ignore this, since it might affect loading
    and ask for hooks that are not there or are otherwise annoying/dangerous,
    for example.

commit 2ec0206482f46491086791b6b8718d5094cb4d77
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Mon Apr 26 23:26:09 2021 +0200

    Also use minimal env in the new Mercurial loader
    
    The old loader (bundle2 loader) already received this treatment which
    ensures Mercurial doesn't pick up on any user customization, but I
    apparently forgot to apply the same changes to the new one.

commit 1d8b26c042011f3271e451b11b670f37b44e8685
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Tue Apr 27 10:53:56 2021 +0200

    Use billiard instead of stdlib multiprocessing
    
    This circumvents a few celery-related issues, and is consistent with
    what the rest of the codebase does.

Link to build: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/207/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/207/console

Harbormaster returned this revision to the author for changes because remote builds failed.Apr 27 2021, 5:37 PM
Harbormaster failed remote builds in B21060: Diff 20067!

Build is green

Patch application report for D5627 (id=20080)

Could not rebase; Attempt merge onto f03f274065...

Updating f03f274..d88ab53
Fast-forward
 swh/loader/mercurial/from_disk.py         | 27 +++++++++++++++++++++++----
 swh/loader/mercurial/hgutil.py            | 18 +++++++++++++-----
 swh/loader/mercurial/tests/test_hgutil.py | 11 +++++++----
 swh/loader/mercurial/utils.py             |  3 ++-
 4 files changed, 45 insertions(+), 14 deletions(-)
Changes applied before test
commit d88ab535e1f998136c1b27bf5aca6c585d2440d6
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Mon Apr 26 23:33:20 2021 +0200

    Handle more cases of corruption
    
    Some corrupted repos have missing files or broken logical links in the
    underlying Mercurial datastructure, which means that say sometimes fail
    for a given revision. This does not mean we should throw away the rest
    of the repository. (Tested on repos of various levels and flavors of
    corruption in the Boatbucket archive)

commit 250edbb11b85a62498dde8def39e84367cd3cebb
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Mon Apr 26 23:28:50 2021 +0200

    Ignore the repository's config
    
    `HGRCPATH` only tells Mercurial to ignore the user's config files, but
    some repositories have a `.hg/hgrc` file (only in the case that you copy
    the files instead of cloning, if present) that is usually used for server-side
    configuration. We want to ignore this, since it might affect loading
    and ask for hooks that are not there or are otherwise annoying/dangerous,
    for example.

commit 457fb88bf36d6c4eedee5e9423f9747e3ea4abf4
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Mon Apr 26 23:26:09 2021 +0200

    Also use minimal env in the new Mercurial loader
    
    The old loader (bundle2 loader) already received this treatment which
    ensures Mercurial doesn't pick up on any user customization, but I
    apparently forgot to apply the same changes to the new one.

commit a89783c52f2e3c44e08018e0c5fa99f54471f994
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Tue Apr 27 10:53:56 2021 +0200

    Use billiard instead of stdlib multiprocessing
    
    This circumvents a few celery-related issues, and is consistent with
    what the rest of the codebase does.

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/212/ for more details.

This revision is now accepted and ready to land.Apr 28 2021, 12:20 PM
This revision was landed with ongoing or failed builds.Apr 30 2021, 11:04 AM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D5627 (id=20174)

Rebasing onto 888471483a...

First, rewinding head to replay your work on top of it...
Fast-forwarded diff-target to base-revision-222-D5627.
Changes applied before test

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/222/ for more details.