Page MenuHomeSoftware Heritage

loader: add an hg-specific mapping for branching
ClosedPublic

Authored by Alphare on Jun 2 2021, 7:12 PM.

Details

Summary

As discussed in T3352, the branching mechanism of Mercurial is more
featureful than that of Git's. The Snapshot model was not designed
with multiple heads, closed heads, bookmarks, etc. in mind, but with
only branches being "pointers" to (mostly) revisions.
As a workaround for a possible re-design of the Snapshot model (though
nothing of the sort is planned for now), we define a mapping that better
represents Mercurial's branching system.

This allows for handling multiple heads per branch and closed branches,
whose revisions (if not already covered by another branch) would
previously have been lost to the ether. Additionally, bookmarks are now
saved to get a better representation of the projects that do use them.

Diff Detail

Repository
rDLDHG Mercurial loader
Branch
multiple-heads
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 21772
Build 33856: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 33855: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Remove useless change mistakenly left in the commit

Build is green

Patch application report for D5816 (id=20761)

Rebasing onto 0f19c32fc0...

Current branch diff-target is up to date.
Changes applied before test
commit 6da6f9740d575d0ef1eec306c446968dd66fe565
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Wed Jun 2 15:51:54 2021 +0200

    loader: add an hg-specific mapping for branching
    
    As discussed in T3352, the branching mechanism of Mercurial is more
    featureful than that of Git's. The Snapshot model was not designed
    with multiple heads, closed heads, bookmarks, etc. in mind, but with
    only branches being "pointers" to (mostly) revisions.
    As a workaround for a possible re-design of the Snapshot model (though
    nothing of the sort is planned for now), we define a mapping that better
    represents Mercurial's branching system.
    
    This allows for handling multiple heads per branch and closed branches,
    whose revisions (if not already covered by another branch) would
    previously have been lost to the ether. Additionally, bookmarks are now
    saved to get a better representation of the projects that do use them.

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/246/ for more details.

Build is green

Patch application report for D5816 (id=20762)

Rebasing onto 0f19c32fc0...

Current branch diff-target is up to date.
Changes applied before test
commit 220ed15d2c4a9f366bfb0cc66bd164a44ea4f0e5
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Wed Jun 2 15:51:54 2021 +0200

    loader: add an hg-specific mapping for branching
    
    As discussed in T3352, the branching mechanism of Mercurial is more
    featureful than that of Git's. The Snapshot model was not designed
    with multiple heads, closed heads, bookmarks, etc. in mind, but with
    only branches being "pointers" to (mostly) revisions.
    As a workaround for a possible re-design of the Snapshot model (though
    nothing of the sort is planned for now), we define a mapping that better
    represents Mercurial's branching system.
    
    This allows for handling multiple heads per branch and closed branches,
    whose revisions (if not already covered by another branch) would
    previously have been lost to the ether. Additionally, bookmarks are now
    saved to get a better representation of the projects that do use them.

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/247/ for more details.

zack requested changes to this revision.Jun 2 2021, 8:06 PM
zack added a subscriber: zack.

(i'm marking this as on hold until we have reached a decision on T3352, just to avoid this gets deployed by mistake. But feel free to go ahead with the rest of the review or even override, if you think there is a better safeguard to avoid this gets deployed)

This revision now requires changes to proceed.Jun 2 2021, 8:06 PM

Rebase and adjust the naming scheme

Build is green

Patch application report for D5816 (id=20811)

Rebasing onto f9fa0dd39d...

Current branch diff-target is up to date.
Changes applied before test
commit ea0b8fcce072fbe697bb3fc65d389116db920238
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Wed Jun 2 15:51:54 2021 +0200

    loader: add an hg-specific mapping for branching
    
    As discussed in T3352, the branching mechanism of Mercurial is more
    featureful than that of Git's. The Snapshot model was not designed
    with multiple heads, closed heads, bookmarks, etc. in mind, but with
    only branches being "pointers" to (mostly) revisions.
    As a workaround for a possible re-design of the Snapshot model (though
    nothing of the sort is planned for now), we define a mapping that better
    represents Mercurial's branching system.
    
    This allows for handling multiple heads per branch and closed branches,
    whose revisions (if not already covered by another branch) would
    previously have been lost to the ether. Additionally, bookmarks are now
    saved to get a better representation of the projects that do use them.

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/249/ for more details.

Minor request, which can also be implemented in a subsequent commit, can we have the mapping documented somewhere? As a first approximation even a docstring would do, so that it will show up at docs.s.o. (Not sure if the files being modified here are the most relevant place for it though, it can also go at the root of the mercurial loader Python hierarchy, up to you !)

Add documentation and sort heads

Build is green

Patch application report for D5816 (id=20813)

Rebasing onto f9fa0dd39d...

Current branch diff-target is up to date.
Changes applied before test
commit b6693178242f648cbc15ee2598c55b62a24b78ea
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Wed Jun 2 15:51:54 2021 +0200

    loader: add an hg-specific mapping for branching
    
    As discussed in T3352, the branching mechanism of Mercurial is more
    featureful than that of Git's. The Snapshot model was not designed
    with multiple heads, closed heads, bookmarks, etc. in mind, but with
    only branches being "pointers" to (mostly) revisions.
    As a workaround for a possible re-design of the Snapshot model (though
    nothing of the sort is planned for now), we define a mapping that better
    represents Mercurial's branching system.
    
    This allows for handling multiple heads per branch and closed branches,
    whose revisions (if not already covered by another branch) would
    previously have been lost to the ether. Additionally, bookmarks are now
    saved to get a better representation of the projects that do use them.

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/250/ for more details.

I have a few cosmetic changes to propose, but the overarching change looks good, thank you!

swh/loader/mercurial/from_disk.py
112

I see that "tipmost head" is a mercurial concept readily referenced in its documentation. If you can describe it in a few words, it would help make the description more generally understandable.

113

extra space

120–127

"To reduce the redundancy between snapshot branches in the most common case, when a branch has a single open head, it will only be referenced as branch-tip/<branch-name>. The branch-heads/ hierarchy only appears when a branch has multiple open heads, which we consistently sort by increasing nodeid. The branch-closed-heads/ hierarchy is also sorted by increasing nodeid."

351–369

I wouldn't mind blacklisted_revs to be renamed to ignored_revs

403

Doesn't bytes formatting support %d?

swh/loader/mercurial/hgutil.py
40

same here, s/blacklist/ignored/g

41

This is probably worth using a dataclass, so we know which tuple elements mean what directly.

42–50

The docstring is missing a description of what the ignored list does.

olasd requested changes to this revision.Jun 4 2021, 1:21 PM
This revision now requires changes to proceed.Jun 4 2021, 1:21 PM

Incorporate suggestions by olasd

Build is green

Patch application report for D5816 (id=20815)

Rebasing onto f9fa0dd39d...

Current branch diff-target is up to date.
Changes applied before test
commit 9a06123c2804b8e75763a2ced46f1368f0875cf5
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Wed Jun 2 15:51:54 2021 +0200

    loader: add an hg-specific mapping for branching
    
    As discussed in T3352, the branching mechanism of Mercurial is more
    featureful than that of Git's. The Snapshot model was not designed
    with multiple heads, closed heads, bookmarks, etc. in mind, but with
    only branches being "pointers" to (mostly) revisions.
    As a workaround for a possible re-design of the Snapshot model (though
    nothing of the sort is planned for now), we define a mapping that better
    represents Mercurial's branching system.
    
    This allows for handling multiple heads per branch and closed branches,
    whose revisions (if not already covered by another branch) would
    previously have been lost to the ether. Additionally, bookmarks are now
    saved to get a better representation of the projects that do use them.

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/251/ for more details.

So, I've given this algorithm another read through, and I think there is some room for improvements still, but we're getting there!

swh/loader/mercurial/hgutil.py
63–76

I would suggest moving the assignation of the "branch tip" inside the heads loop, rather than only considering the "first" head.

I'm also a bit surprised that mercurial doesn't provide a consistent sorting of the heads. How would it pick on which one one would do the next commit?

81–82

Do we really need to generate a fake "default" branch here? (or worse, erase an existing one)

I'd suggest returning a "default rev pointer", in addition to the lists of branches, bookmarks, etc., which would either point to the @ bookmark, or to the tip of the default branch.

83–103

I think I would rather return a snapshot with no default (HEAD) branch than to pick one arbitrarily.

What would mercurial itself do if I were to clone a repo with neither a default branch nor an @ bookmark?

105–109

Shouldn't we remove all the members of branch_open_heads with a single value instead (this would avoid generating branch-heads/foo/0 for all branches when there's a single one with multiple open heads)?

113–116

If you apply my other change, these lists should already be sorted.

Alphare added inline comments.
swh/loader/mercurial/hgutil.py
63–76

This is probably simpler yep.

I'm also a bit surprised that mercurial doesn't provide a consistent sorting of the heads. How would it pick on which one one would do the next commit?

Mercurial's sorting is consistent per-repo. If you re-clone a repository, the order of branches or heads isn't guaranteed, since there are multiple valid topologies of any given repo and one of them might be more optimal for the current wire-protocol, or arbitrarily chosen.

81–82

I should have thought about this :)

83–103

I think I would rather return a snapshot with no default (HEAD) branch than to pick one arbitrarily.

If that's an option, I'm happy to oblige.

What would mercurial itself do if I were to clone a repo with neither a default branch nor an @ bookmark?

It takes the tip of the first branch it sees in the branchmap, looks like.

$ hg init no-default
$ cd no-default
$ hg branch stable
$ touch a
$ hg commit -Adm "Initial"
$ hg branches
stable
$ cd ..
$ hg clone no-default no-default-clone
updating to branch stable
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
105–109

Yep, probably better

swh/loader/mercurial/hgutil.py
63–76

How would it pick on which one one would do the next commit?

Sorry, I haven't answered that question, but I'm not sure I understand it, since it's going to apply it on top of whichever head you're on.

swh/loader/mercurial/hgutil.py
83–103

I found the documentation:

Mercurial will update the working directory to the first applicable
revision from this list:

a) null if -U or the source repository has no changesets
b) if -u . and the source repository is local, the first parent of
   the source repository's working directory
c) the changeset specified with -u (if a branch name, this means the
   latest head of that branch)
d) the changeset specified with -r
e) the tipmost head specified with -b
f) the tipmost head specified with the url#branch source syntax
g) the revision marked with the '@' bookmark, if present
h) the tipmost head of the default branch
i) tip
swh/loader/mercurial/hgutil.py
63–76

Ack. I'm still a bit surprised that the "tipmost head" of a branch depends on the order of operations rather than an actual property of the changeset, but if that's the way it is...

83–103

And, what you're saying is that the "tipmost head" for a given branch depends on the local order of operations on the repository?

(sorry for being slow, I'm just trying to make sure we use the most faithful representation)

swh/loader/mercurial/hgutil.py
63–76

Yeah, tip is an approximation of "what is most recent is probably the thing you want", but is kind of hellish if you think about it too much. Mercurial discourages you from having multiple heads on a branch, and we (Octobus with Heptapod) are even stricter than this.
It's the way it is!

83–103

That is correct.

swh/loader/mercurial/hgutil.py
83–103

Then I'm fine with you just implementing g), h) and i), and generating a snapshot with no HEAD if neither of these yields a reference to a changeset.

Incorporate latest suggestions

Build has FAILED

Patch application report for D5816 (id=20853)

Rebasing onto f9fa0dd39d...

Current branch diff-target is up to date.
Changes applied before test
commit e447e0b8f23a4bb7425206805ae956e0f65d713a
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Wed Jun 2 15:51:54 2021 +0200

    loader: add an hg-specific mapping for branching
    
    As discussed in T3352, the branching mechanism of Mercurial is more
    featureful than that of Git's. The Snapshot model was not designed
    with multiple heads, closed heads, bookmarks, etc. in mind, but with
    only branches being "pointers" to (mostly) revisions.
    As a workaround for a possible re-design of the Snapshot model (though
    nothing of the sort is planned for now), we define a mapping that better
    represents Mercurial's branching system.
    
    This allows for handling multiple heads per branch and closed branches,
    whose revisions (if not already covered by another branch) would
    previously have been lost to the ether. Additionally, bookmarks are now
    saved to get a better representation of the projects that do use them.

Link to build: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/252/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/252/console

Build has FAILED

Patch application report for D5816 (id=20859)

Rebasing onto f9fa0dd39d...

Current branch diff-target is up to date.
Changes applied before test
commit e447e0b8f23a4bb7425206805ae956e0f65d713a
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Wed Jun 2 15:51:54 2021 +0200

    loader: add an hg-specific mapping for branching
    
    As discussed in T3352, the branching mechanism of Mercurial is more
    featureful than that of Git's. The Snapshot model was not designed
    with multiple heads, closed heads, bookmarks, etc. in mind, but with
    only branches being "pointers" to (mostly) revisions.
    As a workaround for a possible re-design of the Snapshot model (though
    nothing of the sort is planned for now), we define a mapping that better
    represents Mercurial's branching system.
    
    This allows for handling multiple heads per branch and closed branches,
    whose revisions (if not already covered by another branch) would
    previously have been lost to the ether. Additionally, bookmarks are now
    saved to get a better representation of the projects that do use them.

Link to build: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/253/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/253/console

Hey!

swh/loader/mercurial/from_disk.py
420–426

Multiple questions here:

  • Is the @ bookmark guaranteed to be the tip of a branch?
  • shouldn't we check that default_rev is indeed the tip of the branch that we're considering?
swh/loader/mercurial/from_disk.py
420–426

Multiple questions here:

  • Is the @ bookmark guaranteed to be the tip of a branch?

Nope, you can put it anywhere.

  • shouldn't we check that default_rev is indeed the tip of the branch that we're considering?

Sure, if that differs, should we just not register HEAD?

In D5816#148875, @olasd wrote:

Hey!

Obviously this got submitted too fast :-)

I've just tried rebasing this on top of master and the mypy issues are gone.

There's one last pending question on the logic for the HEAD alias branch, which you've replied to already, so I'll go there :D

swh/loader/mercurial/from_disk.py
420–426

Instead of going through a rev (and losing which heuristic we've used to find the default rev) I think the other function should just return the name of the branch or bookmark that we've used to determine the default.

swh/loader/mercurial/from_disk.py
420–426

I'm not sure what you're suggesting. Should hgutil.branching_info return the name of the branch (or bookmark) on which the default revision is found if and only if said revision is a branch tip, allowing us to add an alias to it?

swh/loader/mercurial/from_disk.py
420–426

Yeah, that's what I would suggest (rather than applying a heuristic in hgutil.branching_info, then having to reverse engineer it when generating the snapshot)

Update default branch mechanism

This revision is now accepted and ready to land.Jun 15 2021, 11:24 AM

Thank you!!

swh/loader/mercurial/from_disk.py
109

I guess it's not required anymore !

Build is green

Patch application report for D5816 (id=21010)

Rebasing onto 2b3dbb0ab5...

Current branch diff-target is up to date.
Changes applied before test
commit 318c7ff7c95ef01b68340d0ca15476b09dca8e6b
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Wed Jun 2 15:51:54 2021 +0200

    loader: add an hg-specific mapping for branching
    
    As discussed in T3352, the branching mechanism of Mercurial is more
    featureful than that of Git's. The Snapshot model was not designed
    with multiple heads, closed heads, bookmarks, etc. in mind, but with
    only branches being "pointers" to (mostly) revisions.
    As a workaround for a possible re-design of the Snapshot model (though
    nothing of the sort is planned for now), we define a mapping that better
    represents Mercurial's branching system.
    
    This allows for handling multiple heads per branch and closed branches,
    whose revisions (if not already covered by another branch) would
    previously have been lost to the ether. Additionally, bookmarks are now
    saved to get a better representation of the projects that do use them.

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/254/ for more details.

HEAD is not required anymore

This revision was landed with ongoing or failed builds.Jun 15 2021, 11:30 AM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D5816 (id=21013)

Rebasing onto 2b3dbb0ab5...

Current branch diff-target is up to date.
Changes applied before test
commit 2877eb3c4ec3af60d716b266624d41009e0bce26
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Wed Jun 2 15:51:54 2021 +0200

    loader: add an hg-specific mapping for branching
    
    As discussed in T3352, the branching mechanism of Mercurial is more
    featureful than that of Git's. The Snapshot model was not designed
    with multiple heads, closed heads, bookmarks, etc. in mind, but with
    only branches being "pointers" to (mostly) revisions.
    As a workaround for a possible re-design of the Snapshot model (though
    nothing of the sort is planned for now), we define a mapping that better
    represents Mercurial's branching system.
    
    This allows for handling multiple heads per branch and closed branches,
    whose revisions (if not already covered by another branch) would
    previously have been lost to the ether. Additionally, bookmarks are now
    saved to get a better representation of the projects that do use them.

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/255/ for more details.