instead of going through swh-storage.
This also allows batching queries, so it should be more efficient overall.
Depends on D5730.
Differential D5731
git_bare: Optionally access the objstorage directly vlorentz on May 10 2021, 10:24 PM. Authored by
Details
instead of going through swh-storage. This also allows batching queries, so it should be more efficient overall. Depends on D5730.
Diff Detail
Event TimelineComment Actions Build has FAILED Patch application report for D5731 (id=20464)Could not rebase; Attempt merge onto 35c9f519cd... Updating 35c9f51..b4b60b4 Fast-forward requirements-swh.txt | 1 + swh/vault/cli.py | 15 +- swh/vault/cookers/__init__.py | 6 + swh/vault/cookers/base.py | 17 +- swh/vault/cookers/git_bare.py | 291 ++++++++++++++++++++++++++++++++ swh/vault/in_memory_backend.py | 2 +- swh/vault/tests/test_cli.py | 1 + swh/vault/tests/test_cookers.py | 272 +++++++++++++++++++++-------- swh/vault/tests/test_git_bare_cooker.py | 178 +++++++++++++++++++ 9 files changed, 708 insertions(+), 75 deletions(-) create mode 100644 swh/vault/cookers/git_bare.py create mode 100644 swh/vault/tests/test_git_bare_cooker.py Changes applied before testcommit b4b60b4a678b48f6c86f0360a2c44e2a91630e7a Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon May 10 22:23:39 2021 +0200 git_bare: Optionally access the objstorage directly instead of going through swh-storage. This also allows batching queries, so it should be more efficient overall. commit 57760c2d2333eb16ce569bdce2e090d28b2b60e4 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon May 10 21:52:37 2021 +0200 git_bare: Use batched content_get() instead of content_find() It is considerably faster (30% less run time on an average repo) commit 7a04e787128212aab3bc0aa55f399ec83b40e2f6 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon May 10 21:04:33 2021 +0200 git_bare: Use directory_get_entries instead of directory_ls, it should be faster As it does not need to join with the content table. On small repositories with a warm cache, it doesn't seem to matter much, though. But it's also closer to a feature swh-graph will provide in the future, so it's a win anyway. commit 0c0ff0146058c2280f3cc935993af78c8be710eb Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon May 10 20:09:30 2021 +0200 git_bare: Refactor the graph descent using explicit stacks instead of the call stack. This will allow batching large groups of objects, instead of being limited to those given as argument from a parent. commit 43e735a7a5fc8c7f89275df8a03124358c0c3cc3 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri May 7 11:30:10 2021 +0200 git_bare: When possible, use swh-graph instead of swh-storage to query revision history We expect it to be more efficient eventually; but run time is equivalent so far. commit 8007936a8aff0b29eefdd93bbe037b996d6b743d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu May 6 14:43:37 2021 +0200 Run all directory tests on the gitfast cooker 1. It increases test coverage 2. test_revision_bogus_perms it now redundant (there is test_directory_bogus_perms) commit 3e76bc5656d0aa1eb510dcfdaa3b6196f6ee5976 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Apr 30 22:22:17 2021 +0200 git_bare: Deduplicate object downloads and writes commit 4052f53698454ac47a01d26d470b8ab4b0f77a6d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Apr 30 19:30:34 2021 +0200 Add a naive git bare cooker It can cook directories (by adding a synthetic revision pointing to it) and revisions. Current limitations: * It does not deduplicate directories and files at all, and queries all objects one by one. * No support for missing/absent contents * No support for missing submodules Tests reuse existing tests of the DirectoryCooker and RevisionGitfastCooker using parametrized pytest fixtures. Link to build: https://jenkins.softwareheritage.org/job/DVAU/job/tests-on-diff/91/ Comment Actions I suspect some gains could come from parallelizing objstorage accesses. It's probably worth doing directly in the objstorage get_batch method for the backends that na(t)ively parallelism, e.g. azure_prefixed.
Comment Actions apply comments:
Comment Actions Build has FAILED Patch application report for D5731 (id=20499)Could not rebase; Attempt merge onto 35c9f519cd... Updating 35c9f51..e2e9244 Fast-forward requirements-swh.txt | 1 + swh/vault/cli.py | 15 +- swh/vault/cookers/__init__.py | 6 + swh/vault/cookers/base.py | 17 +- swh/vault/cookers/git_bare.py | 289 ++++++++++++++++++++++++++++++++ swh/vault/in_memory_backend.py | 2 +- swh/vault/tests/test_cli.py | 1 + swh/vault/tests/test_cookers.py | 280 +++++++++++++++++++++++-------- swh/vault/tests/test_git_bare_cooker.py | 181 ++++++++++++++++++++ 9 files changed, 715 insertions(+), 77 deletions(-) create mode 100644 swh/vault/cookers/git_bare.py create mode 100644 swh/vault/tests/test_git_bare_cooker.py Changes applied before testcommit e2e924430a835b30151ff78fc1904fc8d67ac5b8 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon May 10 22:23:39 2021 +0200 git_bare: Optionally access the objstorage directly instead of going through swh-storage. This also allows batching queries, so it should be more efficient overall. commit 66b54b5cdcc21e1512cd5959023827c43aac1b1d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon May 10 21:52:37 2021 +0200 git_bare: Use batched content_get() instead of content_find() It is considerably faster (30% less run time on an average repo) commit b00c56677bc046418b8be5b731845ca07af60a30 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon May 10 21:04:33 2021 +0200 git_bare: Use directory_get_entries instead of directory_ls, it should be faster As it does not need to join with the content table. On small repositories with a warm cache, it doesn't seem to matter much, though. But it's also closer to a feature swh-graph will provide in the future, so it's a win anyway. commit bfcb0af5dc20ab7cfd84e0b7621d0978aece6de8 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon May 10 20:09:30 2021 +0200 git_bare: Refactor the graph descent using explicit stacks instead of the call stack. This will allow batching large groups of objects, instead of being limited to those given as argument from a parent. commit 2ec60e27c75775f7073dd51947648a999748be35 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri May 7 11:30:10 2021 +0200 git_bare: When possible, use swh-graph instead of swh-storage to query revision history We expect it to be more efficient eventually; but run time is equivalent so far. commit 8007936a8aff0b29eefdd93bbe037b996d6b743d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu May 6 14:43:37 2021 +0200 Run all directory tests on the gitfast cooker 1. It increases test coverage 2. test_revision_bogus_perms it now redundant (there is test_directory_bogus_perms) commit 3e76bc5656d0aa1eb510dcfdaa3b6196f6ee5976 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Apr 30 22:22:17 2021 +0200 git_bare: Deduplicate object downloads and writes commit 4052f53698454ac47a01d26d470b8ab4b0f77a6d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Apr 30 19:30:34 2021 +0200 Add a naive git bare cooker It can cook directories (by adding a synthetic revision pointing to it) and revisions. Current limitations: * It does not deduplicate directories and files at all, and queries all objects one by one. * No support for missing/absent contents * No support for missing submodules Tests reuse existing tests of the DirectoryCooker and RevisionGitfastCooker using parametrized pytest fixtures. Link to build: https://jenkins.softwareheritage.org/job/DVAU/job/tests-on-diff/101/ Comment Actions Build has FAILED Patch application report for D5731 (id=20515)Could not rebase; Attempt merge onto 545246e9af... Updating 545246e..3bf5cbc Fast-forward requirements-swh.txt | 2 +- swh/vault/cli.py | 11 +++- swh/vault/cookers/base.py | 2 + swh/vault/cookers/git_bare.py | 131 ++++++++++++++++++++++++++-------------- swh/vault/tests/test_cookers.py | 67 ++++++++++++++++++-- 5 files changed, 160 insertions(+), 53 deletions(-) Changes applied before testcommit 3bf5cbc4c41bee7dcf0d047471f56b4b8a524ac5 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon May 10 22:23:39 2021 +0200 git_bare: Optionally access the objstorage directly instead of going through swh-storage. This also allows batching queries, so it should be more efficient overall. commit bea488d2eb3c26e77dc38ee4410b0820165409d7 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon May 10 21:52:37 2021 +0200 git_bare: Use batched content_get() instead of content_find() It is considerably faster (30% less run time on an average repo) commit 6fb358d6aaeb3969c41c497a9ec9d24847220c51 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon May 10 21:04:33 2021 +0200 git_bare: Use directory_get_entries instead of directory_ls, it should be faster As it does not need to join with the content table. On small repositories with a warm cache, it doesn't seem to matter much, though. But it's also closer to a feature swh-graph will provide in the future, so it's a win anyway. commit e77069a5e1cab7630e912ac59b0aa1242346a95f Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon May 10 20:09:30 2021 +0200 git_bare: Refactor the graph descent using explicit stacks instead of the call stack. This will allow batching large groups of objects, instead of being limited to those given as argument from a parent. Link to build: https://jenkins.softwareheritage.org/job/DVAU/job/tests-on-diff/116/ Comment Actions Build is green Patch application report for D5731 (id=20519)Rebasing onto bea488d2eb... Current branch diff-target is up to date. Changes applied before testcommit 15a16d9da01d34550363eef2bf1735d9f39b4032 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon May 10 22:23:39 2021 +0200 git_bare: Optionally access the objstorage directly instead of going through swh-storage. This also allows batching queries, so it should be more efficient overall. See https://jenkins.softwareheritage.org/job/DVAU/job/tests-on-diff/117/ for more details. |