Page MenuHomeSoftware Heritage

git_bare: Add support for swh-graph when loading a snapshot
ClosedPublic

Authored by vlorentz on Jul 16 2021, 3:26 PM.

Details

Summary

This should be a considerable performance improvement, as we
don't need to query swh-graph for every head (which includes
a lot of duplicates).

Depends on D6001.

Diff Detail

Repository
rDVAU Software Heritage Vault
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D6002 (id=21667)

Could not rebase; Attempt merge onto 5951a5c63b...

Updating 5951a5c..2375849
Fast-forward
 swh/vault/cli.py                        |   9 +-
 swh/vault/cookers/__init__.py           |   1 +
 swh/vault/cookers/git_bare.py           | 128 ++++++++++-
 swh/vault/tests/test_cookers.py         | 378 +++++++++++++++++++++++++++-----
 swh/vault/tests/test_git_bare_cooker.py | 113 +++++++---
 5 files changed, 541 insertions(+), 88 deletions(-)
Changes applied before test
commit 2375849b888d096ed3a2d8696a6455ecc5338dda
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jul 16 15:26:05 2021 +0200

    git_bare: Add support for swh-graph when loading a snapshot
    
    This should be a considerable performance improvement, as we
    don't need to query swh-graph for every head (which includes
    a lot of duplicates).

commit 7122ef5099c6a1e3be2c28b889122c9c1beb882a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jul 16 14:29:18 2021 +0200

    cli: Hide traceback when cookers fail
    
    There is nothing interesting about this traceback because it only
    shows frames in the objstorage and the CLI.
    The actual error is always printed before with its own traceback
    before this one is shown.

commit 73b2b13b9e06a8bdf5f6ea5d3b526d1e0c53fcf9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jul 16 14:25:35 2021 +0200

    git_bare: Add support for annotated tags pointing to commits

commit 4cb80723d8fbd094bd5b6439d71cfc2e2e091a86
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jul 15 19:46:54 2021 +0200

    git_bare: Add partial support for snapshots (no release or swh-graph support yet)

commit 70fc50f40b8bc13b022f96be0027e89175773fd7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jul 15 17:20:44 2021 +0200

    tests: Split RepoFixtures off TestRevisionCooker
    
    So the 'load' and 'check' parts of the tests can be reused by future
    snapshot tests.

commit a1f42284c683ec1d0ee169cda67f507529822473
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jul 15 16:58:55 2021 +0200

    tests: Make TestRevisionCooker more modular
    
    So its 'load' and 'check' parts can be reused for testing snapshots

See https://jenkins.softwareheritage.org/job/DVAU/job/tests-on-diff/144/ for more details.

  • fix release testing
  • rename last_revision_in_graph to up_to_date_graph, because this boolean is now used for a bunch of objects instead of just rev2

Build is green

Patch application report for D6002 (id=21684)

Could not rebase; Attempt merge onto 70fc50f40b...

Updating 70fc50f..a505474
Fast-forward
 swh/vault/cli.py                        |   9 +-
 swh/vault/cookers/__init__.py           |   1 +
 swh/vault/cookers/git_bare.py           | 128 +++++++++++++--
 swh/vault/tests/test_cookers.py         | 269 ++++++++++++++++++++++++++++++--
 swh/vault/tests/test_git_bare_cooker.py | 124 +++++++++++----
 5 files changed, 475 insertions(+), 56 deletions(-)
Changes applied before test
commit a5054749dd102f9ec4f35d4f8cc8787482982ada
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jul 16 15:26:05 2021 +0200

    git_bare: Add support for swh-graph when loading a snapshot
    
    This should be a considerable performance improvement, as we
    don't need to query swh-graph for every head (which includes
    a lot of duplicates).

commit 7122ef5099c6a1e3be2c28b889122c9c1beb882a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jul 16 14:29:18 2021 +0200

    cli: Hide traceback when cookers fail
    
    There is nothing interesting about this traceback because it only
    shows frames in the objstorage and the CLI.
    The actual error is always printed before with its own traceback
    before this one is shown.

commit 73b2b13b9e06a8bdf5f6ea5d3b526d1e0c53fcf9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jul 16 14:25:35 2021 +0200

    git_bare: Add support for annotated tags pointing to commits

commit 4cb80723d8fbd094bd5b6439d71cfc2e2e091a86
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jul 15 19:46:54 2021 +0200

    git_bare: Add partial support for snapshots (no release or swh-graph support yet)

See https://jenkins.softwareheritage.org/job/DVAU/job/tests-on-diff/145/ for more details.

douardda added a subscriber: douardda.

LGTM but see my questions (not sure they make really sense, but who knows)

swh/vault/cookers/git_bare.py
326

do we really want "just" an assert here (which means, for me, "do not check this in production" since prod should/may run with -O)

330

why a NotImplementedError here? is there a possibility that this will make sense and be implemented some day?

This revision is now accepted and ready to land.Jul 27 2021, 11:28 AM
swh/vault/cookers/git_bare.py
326

If we want to use -O, then there are many more asserts to replace all over the codebase.

330

Both Git's and SWH's data models allow it, so we have to do something about it. raising NotImplementedError just defers the decision because it's really an edge case and I don't want to think about it for now. (It would also probably require changing the test framework, because the Git CLI probably does not allow it)