Page MenuHomeSoftware Heritage

git_bare: Refactor the graph descent using explicit stacks instead of the call stack.
ClosedPublic

Authored by vlorentz on May 11 2021, 9:57 AM.

Details

Summary

This will allow batching large groups of objects, instead of being limited
to those given as argument from a parent.

Depends on D5708.

(note: I forgot to open this diff earlier, it's actually a parent of D5730)

Diff Detail

Repository
rDVAU Software Heritage Vault
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D5733 (id=20467)

Could not rebase; Attempt merge onto 35c9f519cd...

Updating 35c9f51..0c0ff01
Fast-forward
 requirements-swh.txt                    |   1 +
 swh/vault/cli.py                        |   6 +-
 swh/vault/cookers/__init__.py           |   6 +
 swh/vault/cookers/base.py               |  15 +-
 swh/vault/cookers/git_bare.py           | 281 ++++++++++++++++++++++++++++++++
 swh/vault/in_memory_backend.py          |   2 +-
 swh/vault/tests/test_cli.py             |   1 +
 swh/vault/tests/test_cookers.py         | 217 ++++++++++++++++--------
 swh/vault/tests/test_git_bare_cooker.py | 178 ++++++++++++++++++++
 9 files changed, 632 insertions(+), 75 deletions(-)
 create mode 100644 swh/vault/cookers/git_bare.py
 create mode 100644 swh/vault/tests/test_git_bare_cooker.py
Changes applied before test
commit 0c0ff0146058c2280f3cc935993af78c8be710eb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 10 20:09:30 2021 +0200

    git_bare: Refactor the graph descent using explicit stacks instead of the call stack.
    
    This will allow batching large groups of objects, instead of being limited
    to those given as argument from a parent.

commit 43e735a7a5fc8c7f89275df8a03124358c0c3cc3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri May 7 11:30:10 2021 +0200

    git_bare: When possible, use swh-graph instead of swh-storage to query revision history
    
    We expect it to be more efficient eventually; but run time is equivalent so far.

commit 8007936a8aff0b29eefdd93bbe037b996d6b743d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu May 6 14:43:37 2021 +0200

    Run all directory tests on the gitfast cooker
    
    1. It increases test coverage
    2. test_revision_bogus_perms it now redundant (there is test_directory_bogus_perms)

commit 3e76bc5656d0aa1eb510dcfdaa3b6196f6ee5976
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Apr 30 22:22:17 2021 +0200

    git_bare: Deduplicate object downloads and writes

commit 4052f53698454ac47a01d26d470b8ab4b0f77a6d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Apr 30 19:30:34 2021 +0200

    Add a naive git bare cooker
    
    It can cook directories (by adding a synthetic revision pointing to it)
    and revisions.
    
    Current limitations:
    
    * It does not deduplicate directories and files at all, and queries
      all objects one by one.
    * No support for missing/absent contents
    * No support for missing submodules
    
    Tests reuse existing tests of the DirectoryCooker and
    RevisionGitfastCooker using parametrized pytest fixtures.

Link to build: https://jenkins.softwareheritage.org/job/DVAU/job/tests-on-diff/92/
See console output for more information: https://jenkins.softwareheritage.org/job/DVAU/job/tests-on-diff/92/console

Harbormaster returned this revision to the author for changes because remote builds failed.May 11 2021, 9:57 AM
Harbormaster failed remote builds in B21441: Diff 20467!

Build has FAILED

Patch application report for D5733 (id=20469)

Could not rebase; Attempt merge onto 35c9f519cd...

Updating 35c9f51..43abcab
Fast-forward
 requirements-swh.txt                    |   1 +
 swh/vault/cli.py                        |   6 +-
 swh/vault/cookers/__init__.py           |   6 +
 swh/vault/cookers/base.py               |  15 +-
 swh/vault/cookers/git_bare.py           | 279 ++++++++++++++++++++++++++++++++
 swh/vault/in_memory_backend.py          |   2 +-
 swh/vault/tests/test_cli.py             |   1 +
 swh/vault/tests/test_cookers.py         | 217 +++++++++++++++++--------
 swh/vault/tests/test_git_bare_cooker.py | 178 ++++++++++++++++++++
 9 files changed, 630 insertions(+), 75 deletions(-)
 create mode 100644 swh/vault/cookers/git_bare.py
 create mode 100644 swh/vault/tests/test_git_bare_cooker.py
Changes applied before test
commit 43abcab0017eec8226835f47510846eb10bf9336
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 10 20:09:30 2021 +0200

    git_bare: Refactor the graph descent using explicit stacks instead of the call stack.
    
    This will allow batching large groups of objects, instead of being limited
    to those given as argument from a parent.

commit 43e735a7a5fc8c7f89275df8a03124358c0c3cc3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri May 7 11:30:10 2021 +0200

    git_bare: When possible, use swh-graph instead of swh-storage to query revision history
    
    We expect it to be more efficient eventually; but run time is equivalent so far.

commit 8007936a8aff0b29eefdd93bbe037b996d6b743d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu May 6 14:43:37 2021 +0200

    Run all directory tests on the gitfast cooker
    
    1. It increases test coverage
    2. test_revision_bogus_perms it now redundant (there is test_directory_bogus_perms)

commit 3e76bc5656d0aa1eb510dcfdaa3b6196f6ee5976
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Apr 30 22:22:17 2021 +0200

    git_bare: Deduplicate object downloads and writes

commit 4052f53698454ac47a01d26d470b8ab4b0f77a6d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Apr 30 19:30:34 2021 +0200

    Add a naive git bare cooker
    
    It can cook directories (by adding a synthetic revision pointing to it)
    and revisions.
    
    Current limitations:
    
    * It does not deduplicate directories and files at all, and queries
      all objects one by one.
    * No support for missing/absent contents
    * No support for missing submodules
    
    Tests reuse existing tests of the DirectoryCooker and
    RevisionGitfastCooker using parametrized pytest fixtures.

Link to build: https://jenkins.softwareheritage.org/job/DVAU/job/tests-on-diff/94/
See console output for more information: https://jenkins.softwareheritage.org/job/DVAU/job/tests-on-diff/94/console

Build has FAILED

Patch application report for D5733 (id=20470)

Rebasing onto 35c9f519cd...

Current branch diff-target is up to date.
Changes applied before test
commit 40e875b2eb499ef814f60d62ec83007b43593948
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 10 20:09:30 2021 +0200

    git_bare: Refactor the graph descent using explicit stacks instead of the call stack.
    
    This will allow batching large groups of objects, instead of being limited
    to those given as argument from a parent.

commit 43e735a7a5fc8c7f89275df8a03124358c0c3cc3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri May 7 11:30:10 2021 +0200

    git_bare: When possible, use swh-graph instead of swh-storage to query revision history
    
    We expect it to be more efficient eventually; but run time is equivalent so far.

commit 8007936a8aff0b29eefdd93bbe037b996d6b743d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu May 6 14:43:37 2021 +0200

    Run all directory tests on the gitfast cooker
    
    1. It increases test coverage
    2. test_revision_bogus_perms it now redundant (there is test_directory_bogus_perms)

commit 3e76bc5656d0aa1eb510dcfdaa3b6196f6ee5976
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Apr 30 22:22:17 2021 +0200

    git_bare: Deduplicate object downloads and writes

commit 4052f53698454ac47a01d26d470b8ab4b0f77a6d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Apr 30 19:30:34 2021 +0200

    Add a naive git bare cooker
    
    It can cook directories (by adding a synthetic revision pointing to it)
    and revisions.
    
    Current limitations:
    
    * It does not deduplicate directories and files at all, and queries
      all objects one by one.
    * No support for missing/absent contents
    * No support for missing submodules
    
    Tests reuse existing tests of the DirectoryCooker and
    RevisionGitfastCooker using parametrized pytest fixtures.

Link to build: https://jenkins.softwareheritage.org/job/DVAU/job/tests-on-diff/95/
See console output for more information: https://jenkins.softwareheritage.org/job/DVAU/job/tests-on-diff/95/console

Build has FAILED

Patch application report for D5733 (id=20471)

Could not rebase; Attempt merge onto 35c9f519cd...

Updating 35c9f51..43e735a
Fast-forward
 requirements-swh.txt                    |   1 +
 swh/vault/cli.py                        |   6 +-
 swh/vault/cookers/__init__.py           |   6 +
 swh/vault/cookers/base.py               |  15 +-
 swh/vault/cookers/git_bare.py           | 252 ++++++++++++++++++++++++++++++++
 swh/vault/in_memory_backend.py          |   2 +-
 swh/vault/tests/test_cli.py             |   1 +
 swh/vault/tests/test_cookers.py         | 217 ++++++++++++++++++---------
 swh/vault/tests/test_git_bare_cooker.py | 178 ++++++++++++++++++++++
 9 files changed, 603 insertions(+), 75 deletions(-)
 create mode 100644 swh/vault/cookers/git_bare.py
 create mode 100644 swh/vault/tests/test_git_bare_cooker.py
Changes applied before test
commit 43e735a7a5fc8c7f89275df8a03124358c0c3cc3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri May 7 11:30:10 2021 +0200

    git_bare: When possible, use swh-graph instead of swh-storage to query revision history
    
    We expect it to be more efficient eventually; but run time is equivalent so far.

commit 8007936a8aff0b29eefdd93bbe037b996d6b743d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu May 6 14:43:37 2021 +0200

    Run all directory tests on the gitfast cooker
    
    1. It increases test coverage
    2. test_revision_bogus_perms it now redundant (there is test_directory_bogus_perms)

commit 3e76bc5656d0aa1eb510dcfdaa3b6196f6ee5976
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Apr 30 22:22:17 2021 +0200

    git_bare: Deduplicate object downloads and writes

commit 4052f53698454ac47a01d26d470b8ab4b0f77a6d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Apr 30 19:30:34 2021 +0200

    Add a naive git bare cooker
    
    It can cook directories (by adding a synthetic revision pointing to it)
    and revisions.
    
    Current limitations:
    
    * It does not deduplicate directories and files at all, and queries
      all objects one by one.
    * No support for missing/absent contents
    * No support for missing submodules
    
    Tests reuse existing tests of the DirectoryCooker and
    RevisionGitfastCooker using parametrized pytest fixtures.

Link to build: https://jenkins.softwareheritage.org/job/DVAU/job/tests-on-diff/96/
See console output for more information: https://jenkins.softwareheritage.org/job/DVAU/job/tests-on-diff/96/console

Build has FAILED

Patch application report for D5733 (id=20472)

Could not rebase; Attempt merge onto 35c9f519cd...

Updating 35c9f51..40e875b
Fast-forward
 requirements-swh.txt                    |   1 +
 swh/vault/cli.py                        |   6 +-
 swh/vault/cookers/__init__.py           |   6 +
 swh/vault/cookers/base.py               |  15 +-
 swh/vault/cookers/git_bare.py           | 279 ++++++++++++++++++++++++++++++++
 swh/vault/in_memory_backend.py          |   2 +-
 swh/vault/tests/test_cli.py             |   1 +
 swh/vault/tests/test_cookers.py         | 217 +++++++++++++++++--------
 swh/vault/tests/test_git_bare_cooker.py | 178 ++++++++++++++++++++
 9 files changed, 630 insertions(+), 75 deletions(-)
 create mode 100644 swh/vault/cookers/git_bare.py
 create mode 100644 swh/vault/tests/test_git_bare_cooker.py
Changes applied before test
commit 40e875b2eb499ef814f60d62ec83007b43593948
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 10 20:09:30 2021 +0200

    git_bare: Refactor the graph descent using explicit stacks instead of the call stack.
    
    This will allow batching large groups of objects, instead of being limited
    to those given as argument from a parent.

commit 43e735a7a5fc8c7f89275df8a03124358c0c3cc3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri May 7 11:30:10 2021 +0200

    git_bare: When possible, use swh-graph instead of swh-storage to query revision history
    
    We expect it to be more efficient eventually; but run time is equivalent so far.

commit 8007936a8aff0b29eefdd93bbe037b996d6b743d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu May 6 14:43:37 2021 +0200

    Run all directory tests on the gitfast cooker
    
    1. It increases test coverage
    2. test_revision_bogus_perms it now redundant (there is test_directory_bogus_perms)

commit 3e76bc5656d0aa1eb510dcfdaa3b6196f6ee5976
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Apr 30 22:22:17 2021 +0200

    git_bare: Deduplicate object downloads and writes

commit 4052f53698454ac47a01d26d470b8ab4b0f77a6d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Apr 30 19:30:34 2021 +0200

    Add a naive git bare cooker
    
    It can cook directories (by adding a synthetic revision pointing to it)
    and revisions.
    
    Current limitations:
    
    * It does not deduplicate directories and files at all, and queries
      all objects one by one.
    * No support for missing/absent contents
    * No support for missing submodules
    
    Tests reuse existing tests of the DirectoryCooker and
    RevisionGitfastCooker using parametrized pytest fixtures.

Link to build: https://jenkins.softwareheritage.org/job/DVAU/job/tests-on-diff/97/
See console output for more information: https://jenkins.softwareheritage.org/job/DVAU/job/tests-on-diff/97/console

olasd added a subscriber: olasd.
olasd added inline comments.
swh/vault/cookers/git_bare.py
215

Unrelated to this diff, but this is the first time I read this code: Why do you restrict yourself to the revision history here? Wouldn't a full traversal down to the contents be more efficient?

This revision is now accepted and ready to land.May 11 2021, 11:28 AM
swh/vault/cookers/git_bare.py
215

Bah, I see D5708 is open on this. I'll go there.

swh/vault/cookers/git_bare.py
83

We'll need to make sure that this doesn't grow too much for large bundles (think linux.git or mozilla-central). I liked the idea of using the disk + a LRU cache to avoid too much memory usage.

Build is green

Patch application report for D5733 (id=20512)

Rebasing onto 545246e9af...

Current branch diff-target is up to date.
Changes applied before test
commit e77069a5e1cab7630e912ac59b0aa1242346a95f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 10 20:09:30 2021 +0200

    git_bare: Refactor the graph descent using explicit stacks instead of the call stack.
    
    This will allow batching large groups of objects, instead of being limited
    to those given as argument from a parent.

See https://jenkins.softwareheritage.org/job/DVAU/job/tests-on-diff/113/ for more details.