It is considerably faster (30% less run time on an average repo)
Tests will fail because they depend on D5729.
Differential D5730
git_bare: Use batched content_get() instead of content_find() Authored by vlorentz on May 10 2021, 9:53 PM. Tags None Subscribers None
Details
It is considerably faster (30% less run time on an average repo) Tests will fail because they depend on D5729.
Diff Detail
Event TimelineComment Actions Build has FAILED Patch application report for D5730 (id=20463)Could not rebase; Attempt merge onto 35c9f519cd... Updating 35c9f51..57760c2 Fast-forward requirements-swh.txt | 1 + swh/vault/cli.py | 6 +- swh/vault/cookers/__init__.py | 6 + swh/vault/cookers/base.py | 15 +- swh/vault/cookers/git_bare.py | 285 ++++++++++++++++++++++++++++++++ swh/vault/in_memory_backend.py | 2 +- swh/vault/tests/test_cli.py | 1 + swh/vault/tests/test_cookers.py | 217 ++++++++++++++++-------- swh/vault/tests/test_git_bare_cooker.py | 178 ++++++++++++++++++++ 9 files changed, 636 insertions(+), 75 deletions(-) create mode 100644 swh/vault/cookers/git_bare.py create mode 100644 swh/vault/tests/test_git_bare_cooker.py Changes applied before testcommit 57760c2d2333eb16ce569bdce2e090d28b2b60e4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon May 10 21:52:37 2021 +0200
git_bare: Use batched content_get() instead of content_find()
It is considerably faster (30% less run time on an average repo)
commit 7a04e787128212aab3bc0aa55f399ec83b40e2f6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon May 10 21:04:33 2021 +0200
git_bare: Use directory_get_entries instead of directory_ls, it should be faster
As it does not need to join with the content table.
On small repositories with a warm cache, it doesn't seem to matter much, though.
But it's also closer to a feature swh-graph will provide in the future,
so it's a win anyway.
commit 0c0ff0146058c2280f3cc935993af78c8be710eb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon May 10 20:09:30 2021 +0200
git_bare: Refactor the graph descent using explicit stacks instead of the call stack.
This will allow batching large groups of objects, instead of being limited
to those given as argument from a parent.
commit 43e735a7a5fc8c7f89275df8a03124358c0c3cc3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Fri May 7 11:30:10 2021 +0200
git_bare: When possible, use swh-graph instead of swh-storage to query revision history
We expect it to be more efficient eventually; but run time is equivalent so far.
commit 8007936a8aff0b29eefdd93bbe037b996d6b743d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu May 6 14:43:37 2021 +0200
Run all directory tests on the gitfast cooker
1. It increases test coverage
2. test_revision_bogus_perms it now redundant (there is test_directory_bogus_perms)
commit 3e76bc5656d0aa1eb510dcfdaa3b6196f6ee5976
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Fri Apr 30 22:22:17 2021 +0200
git_bare: Deduplicate object downloads and writes
commit 4052f53698454ac47a01d26d470b8ab4b0f77a6d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Fri Apr 30 19:30:34 2021 +0200
Add a naive git bare cooker
It can cook directories (by adding a synthetic revision pointing to it)
and revisions.
Current limitations:
* It does not deduplicate directories and files at all, and queries
all objects one by one.
* No support for missing/absent contents
* No support for missing submodules
Tests reuse existing tests of the DirectoryCooker and
RevisionGitfastCooker using parametrized pytest fixtures.Link to build: https://jenkins.softwareheritage.org/job/DVAU/job/tests-on-diff/90/ Comment Actions (maybe the generation of the content git object should be moved to swh.model for consistency?) Comment Actions Build is green Patch application report for D5730 (id=20514)Could not rebase; Attempt merge onto 545246e9af... Updating 545246e..bea488d Fast-forward requirements-swh.txt | 2 +- swh/vault/cookers/git_bare.py | 127 ++++++++++++++++++++++++++---------------- 2 files changed, 80 insertions(+), 49 deletions(-) Changes applied before testcommit bea488d2eb3c26e77dc38ee4410b0820165409d7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon May 10 21:52:37 2021 +0200
git_bare: Use batched content_get() instead of content_find()
It is considerably faster (30% less run time on an average repo)
commit 6fb358d6aaeb3969c41c497a9ec9d24847220c51
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon May 10 21:04:33 2021 +0200
git_bare: Use directory_get_entries instead of directory_ls, it should be faster
As it does not need to join with the content table.
On small repositories with a warm cache, it doesn't seem to matter much, though.
But it's also closer to a feature swh-graph will provide in the future,
so it's a win anyway.
commit e77069a5e1cab7630e912ac59b0aa1242346a95f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon May 10 20:09:30 2021 +0200
git_bare: Refactor the graph descent using explicit stacks instead of the call stack.
This will allow batching large groups of objects, instead of being limited
to those given as argument from a parent.See https://jenkins.softwareheritage.org/job/DVAU/job/tests-on-diff/115/ for more details. |