Page MenuHomeSoftware Heritage

Don't accidentally do a full run on incremental
ClosedPublic

Authored by Alphare on Dec 1 2021, 6:18 PM.

Details

Summary

Filtering against the storage then asking for new revisions only makes
sense if there are revisions not in storage and not ancestors of heads.
If there are none, the current behavior lists all revisions, which is a
whole lot of wasted work.

Diff Detail

Repository
rDLDHG Mercurial loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D6719 (id=24409)

Rebasing onto e0ce7f8e66...

Current branch diff-target is up to date.
Changes applied before test
commit 8ac42a9a22b72b8c2999656c31e9641e2c43aac2
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Wed Dec 1 18:11:16 2021 +0100

    Fix context for the first revision in a run
    
    See inline comments: in essence, currently the first revision
    in a run (incremental or not) will get the null revision as a
    parent and no cached directory, which creates the wrong diff for that
    revision (i.e. lots of files become added etc).
    
    We fix this problem by doing a full listing of the first revision
    in a run.

commit 6f1ef81f688fa4f3d3f7ee90337d5569a7474f20
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Wed Dec 1 18:05:28 2021 +0100

    Don't accidentally do a full run on incremental
    
    Filtering against the storage then asking for new revisions only makes
    sense if there are revisions not in storage and not ancestors of heads.
    If there are none, the current behavior lists all revisions, which is a
    whole lot of wasted work.

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/321/ for more details.

So, arc diff does an awful squash: this is two commits and the second description is the more important of the two...

So, arc diff does an awful squash: this is two commits and the second description is
the more important of the two...

When you'll push, you'll still have the 2 commits. The diff will get unreadable as it
does a mess after pushing multi-commits diff (keeping only the first commit of the
diffs) but that just leave stuff unreadable when someone wants to browse back the
history through the forge (i do it sometimes and i rant then ;).

But, if you prefer, you can do 2 diffs insteads with:

# first commit to update the current diff you have
$ arc diff HEAD~2 --head HEAD~ --update D6719
# the 2nd commit as dedicated diff
$ arc diff HEAD~ --create  # or --update D<new-id> when you need to update stuff following review

You just have more as author to do. But that's simpler for the reviewer.

swh/loader/mercurial/loader.py
400

well, list everything and do noops at the end or something ;)

This revision is now accepted and ready to land.Dec 2 2021, 10:35 AM

Build is green

Patch application report for D6719 (id=24414)

Rebasing onto e0ce7f8e66...

Current branch diff-target is up to date.
Changes applied before test
commit 6f1ef81f688fa4f3d3f7ee90337d5569a7474f20
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Wed Dec 1 18:05:28 2021 +0100

    Don't accidentally do a full run on incremental
    
    Filtering against the storage then asking for new revisions only makes
    sense if there are revisions not in storage and not ancestors of heads.
    If there are none, the current behavior lists all revisions, which is a
    whole lot of wasted work.

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/322/ for more details.