Page MenuHomeSoftware Heritage

HgLoaderFromDisk: Read and write ExtIDs to find revisions already loaded
ClosedPublic

Authored by vlorentz on Mar 29 2021, 2:11 PM.

Details

Summary

For now, ExtIDs are used in addition to revision metadata.
But in the near future, we want to migrate nodeids from revision metadata
to the ExtID storage, and drop all revision metadata.

Depends on D5369 and D5370

Resolves T3140 and T3142 for this loader

Test Plan

Tests fail because they depend on D5363

Diff Detail

Event Timeline

Build has FAILED

Patch application report for D5371 (id=19244)

Could not rebase; Attempt merge onto a62318d725...

Updating a62318d..c975733
Fast-forward
 swh/loader/mercurial/from_disk.py            |  89 +++++++++++++++---
 swh/loader/mercurial/tests/test_from_disk.py | 130 ++++++++++++++++++++++++++-
 2 files changed, 208 insertions(+), 11 deletions(-)
Changes applied before test
commit c975733d3c3f1974552c5f60300647a2b16237bf
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Mar 29 14:08:05 2021 +0200

    HgLoaderFromDisk: Read and write ExtIDs to find revisions already loaded
    
    For now, ExtIDs are used in addition to revision metadata.
    But in the near future, we want to migrate nodeids from revision metadata
    to the ExtID storage, and drop all revision metadata.

commit 8618e381a56fa87842f09a7f64a27f36e097fdf6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Mar 29 13:10:47 2021 +0200

    HgLoaderFromDisk: Fix type annotation

commit 69f4b023b3ecf21e23838a46935069245dcfd118
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Mar 29 12:34:36 2021 +0200

    HgLoaderFromDisk: Don't query revision_get with release ids
    
    This is a minor performance optimization, removing items from the
    call to revision_get when we know their result will be None.
    
    Motivation: A future commit will refactor this function, and dealing only
    with revision ids makes it simpler.

Link to build: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/189/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/189/console

Harbormaster returned this revision to the author for changes because remote builds failed.Mar 29 2021, 2:13 PM
Harbormaster failed remote builds in B20315: Diff 19244!

Build has FAILED

Patch application report for D5371 (id=19246)

Could not rebase; Attempt merge onto a62318d725...

Updating a62318d..60f7d65
Fast-forward
 swh/loader/mercurial/from_disk.py            |  89 ++++++++++++++++--
 swh/loader/mercurial/tests/test_from_disk.py | 132 ++++++++++++++++++++++++++-
 2 files changed, 210 insertions(+), 11 deletions(-)
Changes applied before test
commit 60f7d6564532ad23aaceda62250eeeee459176ca
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Mar 29 14:08:05 2021 +0200

    HgLoaderFromDisk: Read and write ExtIDs to find revisions already loaded
    
    For now, ExtIDs are used in addition to revision metadata.
    But in the near future, we want to migrate nodeids from revision metadata
    to the ExtID storage, and drop all revision metadata.

commit 8618e381a56fa87842f09a7f64a27f36e097fdf6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Mar 29 13:10:47 2021 +0200

    HgLoaderFromDisk: Fix type annotation

commit 69f4b023b3ecf21e23838a46935069245dcfd118
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Mar 29 12:34:36 2021 +0200

    HgLoaderFromDisk: Don't query revision_get with release ids
    
    This is a minor performance optimization, removing items from the
    call to revision_get when we know their result will be None.
    
    Motivation: A future commit will refactor this function, and dealing only
    with revision ids makes it simpler.

Link to build: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/190/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/190/console

Harbormaster returned this revision to the author for changes because remote builds failed.Mar 29 2021, 2:26 PM
Harbormaster failed remote builds in B20317: Diff 19246!

Build is green

Patch application report for D5371 (id=19246)

Could not rebase; Attempt merge onto a62318d725...

Updating a62318d..60f7d65
Fast-forward
 swh/loader/mercurial/from_disk.py            |  89 ++++++++++++++++--
 swh/loader/mercurial/tests/test_from_disk.py | 132 ++++++++++++++++++++++++++-
 2 files changed, 210 insertions(+), 11 deletions(-)
Changes applied before test
commit 60f7d6564532ad23aaceda62250eeeee459176ca
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Mar 29 14:08:05 2021 +0200

    HgLoaderFromDisk: Read and write ExtIDs to find revisions already loaded
    
    For now, ExtIDs are used in addition to revision metadata.
    But in the near future, we want to migrate nodeids from revision metadata
    to the ExtID storage, and drop all revision metadata.

commit 8618e381a56fa87842f09a7f64a27f36e097fdf6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Mar 29 13:10:47 2021 +0200

    HgLoaderFromDisk: Fix type annotation

commit 69f4b023b3ecf21e23838a46935069245dcfd118
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Mar 29 12:34:36 2021 +0200

    HgLoaderFromDisk: Don't query revision_get with release ids
    
    This is a minor performance optimization, removing items from the
    call to revision_get when we know their result will be None.
    
    Motivation: A future commit will refactor this function, and dealing only
    with revision ids makes it simpler.

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/192/ for more details.

copy revs without MD + fix comparison on loader

Build is green

Patch application report for D5371 (id=19249)

Could not rebase; Attempt merge onto a62318d725...

Updating a62318d..27e5d55
Fast-forward
 swh/loader/mercurial/from_disk.py            |  89 +++++++++++++++--
 swh/loader/mercurial/tests/test_from_disk.py | 142 ++++++++++++++++++++++++++-
 2 files changed, 220 insertions(+), 11 deletions(-)
Changes applied before test
commit 27e5d55cf3942862c145c763190d09f8a4d605a7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Mar 29 14:08:05 2021 +0200

    HgLoaderFromDisk: Read and write ExtIDs to find revisions already loaded
    
    For now, ExtIDs are used in addition to revision metadata.
    But in the near future, we want to migrate nodeids from revision metadata
    to the ExtID storage, and drop all revision metadata.

commit 8618e381a56fa87842f09a7f64a27f36e097fdf6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Mar 29 13:10:47 2021 +0200

    HgLoaderFromDisk: Fix type annotation

commit 69f4b023b3ecf21e23838a46935069245dcfd118
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Mar 29 12:34:36 2021 +0200

    HgLoaderFromDisk: Don't query revision_get with release ids
    
    This is a minor performance optimization, removing items from the
    call to revision_get when we know their result will be None.
    
    Motivation: A future commit will refactor this function, and dealing only
    with revision ids makes it simpler.

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/193/ for more details.

One small nit, but the logic (and test) looks good. Thanks!

swh/loader/mercurial/from_disk.py
241

While I'm not expecting a great performance difference, why create an intermediate list instead of a generator since it's the only use, right after?

This revision is now accepted and ready to land.Mar 29 2021, 6:16 PM
swh/loader/mercurial/from_disk.py
241

It's used twice. (And I prefer to avoid generators because it makes code harder to debug)

This revision was landed with ongoing or failed builds.Mar 30 2021, 9:46 AM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D5371 (id=19263)

Rebasing onto f5e96a7e3f...

First, rewinding head to replay your work on top of it...
Fast-forwarded diff-target to base-revision-196-D5371.
Changes applied before test

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/196/ for more details.