Page MenuHomeSoftware Heritage

Fix branch bookmark id format so ingestion can finish
ClosedPublic

Authored by ardumont on Sep 23 2021, 10:14 AM.

Details

Summary

For bookmarks, the ids format listed are not aligned with the rest of the code. It's
human readable id as bytes string instead of bytes string. As it's not what's expected
by the caller, this failed the build.

This commit adds the extra mapping to fix the issue

Related to T3584#71046 (for the pdb insight)

Fixes D6300
Depends on D6300

Test Plan

tox

Diff Detail

Repository
rDLDHG Mercurial loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ardumont added inline comments.
swh/loader/mercurial/from_disk.py
457 ↗(On Diff #22993)

Hackish way of fixing it, i think the real fix should be with the upstream hgutil.branching_info implementation?
(that could also be pushed down into our get_revision_id_from_hg_nodeid).

Build is green

Patch application report for D6329 (id=22993)

Could not rebase; Attempt merge onto 8e3b880ebc...

Updating 8e3b880..39619a2
Fast-forward
 swh/loader/mercurial/from_disk.py            |   8 ++++++--
 swh/loader/mercurial/tests/data/anomad-d.tgz | Bin 0 -> 2757941 bytes
 swh/loader/mercurial/tests/test_from_disk.py |  13 +++++++++++++
 3 files changed, 19 insertions(+), 2 deletions(-)
 create mode 100644 swh/loader/mercurial/tests/data/anomad-d.tgz
Changes applied before test
commit 39619a24301ec0867cb8048fccae839626b41aec
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Sep 23 10:12:31 2021 +0200

    Identify & fix missing mapping scenario about mismatched bookmark id
    
    Related to T3584

commit ef502bcdf3a2717a5d582db6c3658ae4788def7c
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Sep 17 17:19:20 2021 +0200

    Capture missing revision <-> hgnode-id scenario in a xfail test
    
    anomad-d is the `user`-`repository` name.
    
    It's a repository which presents an anomaly in some undefined way yet. That anomaly
    makes the ingestion fail.
    
    It's something that's happening currently once in a while with the bitbucket ingestion.
    This commit is just a preparatory work to analyze the problem either if i have time to
    or if someone else wants to.
    
    To analyze comment the xfail mark and let the test fail, then debug. That should fairly
    help.
    
    Related to T3584

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/310/ for more details.

swh/loader/mercurial/from_disk.py
457 ↗(On Diff #22993)

yes, the hgutil.branching_info which is not the upstream hgutil, it's one of our module.
So i can actually do it properly, yeah \o/

ardumont retitled this revision from Identify & fix missing mapping scenario about mismatched bookmark id to Fix branch bookmark id format so ingestion can finish.Sep 23 2021, 10:42 AM
ardumont edited the summary of this revision. (Show Details)

Build is green

Patch application report for D6329 (id=22994)

Could not rebase; Attempt merge onto 8e3b880ebc...

Updating 8e3b880..a18db13
Fast-forward
 swh/loader/mercurial/hgutil.py               |  13 ++++++++++++-
 swh/loader/mercurial/tests/data/anomad-d.tgz | Bin 0 -> 2757941 bytes
 swh/loader/mercurial/tests/test_from_disk.py |  13 +++++++++++++
 3 files changed, 25 insertions(+), 1 deletion(-)
 create mode 100644 swh/loader/mercurial/tests/data/anomad-d.tgz
Changes applied before test
commit a18db136ab4fe7c55938eba2aa4531298ff97998
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Sep 23 10:12:31 2021 +0200

    Fix branch bookmark id format so ingestion can finish
    
    For bookmarks, the ids format listed are not aligned with the rest of the code. It's
    human readable id as bytes string instead of bytes string. As it's not what's expected
    by the caller, this failed the build.
    
    This commit adds the extra mapping to fix the issue
    
    Related to T3584

commit ef502bcdf3a2717a5d582db6c3658ae4788def7c
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Sep 17 17:19:20 2021 +0200

    Capture missing revision <-> hgnode-id scenario in a xfail test
    
    anomad-d is the `user`-`repository` name.
    
    It's a repository which presents an anomaly in some undefined way yet. That anomaly
    makes the ingestion fail.
    
    It's something that's happening currently once in a while with the bitbucket ingestion.
    This commit is just a preparatory work to analyze the problem either if i have time to
    or if someone else wants to.
    
    To analyze comment the xfail mark and let the test fail, then debug. That should fairly
    help.
    
    Related to T3584

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/311/ for more details.

Build is green

Patch application report for D6329 (id=22995)

Could not rebase; Attempt merge onto 8e3b880ebc...

Updating 8e3b880..395ea80
Fast-forward
 swh/loader/mercurial/hgutil.py               |   9 ++++++++-
 swh/loader/mercurial/tests/data/anomad-d.tgz | Bin 0 -> 2757941 bytes
 swh/loader/mercurial/tests/test_from_disk.py |  13 +++++++++++++
 3 files changed, 21 insertions(+), 1 deletion(-)
 create mode 100644 swh/loader/mercurial/tests/data/anomad-d.tgz
Changes applied before test
commit 395ea80518cd1f46f190e68790b5d52c4cea946f
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Sep 23 10:12:31 2021 +0200

    Fix branch bookmark id format so ingestion can finish
    
    For bookmarks, the ids format listed are not aligned with the rest of the code. It's
    human readable id as bytes string instead of bytes string. As it's not what's expected
    by the caller, this failed the build.
    
    This commit adds the extra mapping to fix the issue
    
    Related to T3584

commit ef502bcdf3a2717a5d582db6c3658ae4788def7c
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Sep 17 17:19:20 2021 +0200

    Capture missing revision <-> hgnode-id scenario in a xfail test
    
    anomad-d is the `user`-`repository` name.
    
    It's a repository which presents an anomaly in some undefined way yet. That anomaly
    makes the ingestion fail.
    
    It's something that's happening currently once in a while with the bitbucket ingestion.
    This commit is just a preparatory work to analyze the problem either if i have time to
    or if someone else wants to.
    
    To analyze comment the xfail mark and let the test fail, then debug. That should fairly
    help.
    
    Related to T3584

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/312/ for more details.

This revision is now accepted and ready to land.Sep 23 2021, 10:51 AM

Thanks, let's deploy this so we can ingest properly now.
I notice it happens once in a while in the bitbucket origins.

Cheers,