Page MenuHomeSoftware Heritage

[WIP] git bare revision cooker
AbandonedPublic

Authored by vlorentz on Dec 18 2020, 10:59 AM.

Details

Reviewers
tenma
Group Reviewers
Reviewers
Maniphest Tasks
T843: Vault: Add a "git bare" tarball cooker
Summary

This cooker handles a single revision history up to a revision id.
It outputs a Git bare repository of the revision, directory and file objects, packed into a tarball.

This implementation uses the dulwich library to handle git objects.
It writes progressively a disk-backed repository in a temporary location before outputting the tarball info the destination in-memory file.

As dulwich tree objects can only be created from the leaves (files) to the root, the tree-processing algorithm iterates in a depth-first fashion.

Further work includes:

  • making it ready for production
  • output a git bundle which is easier to work with as a git user

Related to T843

Test Plan

Did manual tests in docker environment.
Tested by running the cooker module directly with the vault-worker configuration found in Docker.

TODO need a pytest test suite inspired by the other cookers.

Diff Detail

Event Timeline

Build has FAILED

Patch application report for D4766 (id=16864)

Rebasing onto 1b6a10fded...

Current branch diff-target is up to date.
Changes applied before test
commit 54019acfb25ccc1f9260419c79814e05f1049058
Author: tenma <tenma+swh@mailbox.org>
Date:   Fri Dec 18 10:57:27 2020 +0100

    WIP git bare revision cooker

Link to build: https://jenkins.softwareheritage.org/job/DVAU/job/tests-on-diff/61/
See console output for more information: https://jenkins.softwareheritage.org/job/DVAU/job/tests-on-diff/61/console

Build has FAILED

Patch application report for D4766 (id=16865)

Rebasing onto 1b6a10fded...

Current branch diff-target is up to date.
Changes applied before test
commit 1ef446b3d5269b9b4a7ad2c9e873dd5a7f11d1d5
Author: tenma <tenma+swh@mailbox.org>
Date:   Fri Dec 18 10:57:27 2020 +0100

    WIP git bare revision cooker

Link to build: https://jenkins.softwareheritage.org/job/DVAU/job/tests-on-diff/62/
See console output for more information: https://jenkins.softwareheritage.org/job/DVAU/job/tests-on-diff/62/console

tenma edited the test plan for this revision. (Show Details)
douardda added inline comments.
swh/vault/cookers/revision_bare.py
1

note the copyright dates are wrong

29

why these imports in there (rather than at the top of the module)?

tenma edited the test plan for this revision. (Show Details)
  • use directory_ls rather than dir_iterator to lower overhead and control iteration
  • use a set to test early for known objects
  • only update once master head with the given revision id
  • do not chunk files ids because in the end the storage requests are not chunked
  • partition new and old files to adapt processing

Build has FAILED

Patch application report for D4766 (id=17066)

Rebasing onto b2e17a95fe...

Current branch diff-target is up to date.
Changes applied before test
commit 0210ed0b79c531b4769ad8803160441777b708b6
Author: tenma <tenma+swh@mailbox.org>
Date:   Fri Dec 18 10:57:27 2020 +0100

    WIP git bare revision cooker

Link to build: https://jenkins.softwareheritage.org/job/DVAU/job/tests-on-diff/63/
See console output for more information: https://jenkins.softwareheritage.org/job/DVAU/job/tests-on-diff/63/console

tenma marked 2 inline comments as done.Jan 6 2021, 8:50 PM
  • move known_ids to an instance attribute
  • split file processing loop into processing new files

then all files

  • lru cache to avoid duplicate requests
  • typing
  • logging and asserting
  • notifying progress to vault backend

Build is green

Patch application report for D4766 (id=17840)

Rebasing onto b2e17a95fe...

Current branch diff-target is up to date.
Changes applied before test
commit eceaf67d857e8aa37cc649a531d6a9b4db89dba0
Author: tenma <tenma+swh@mailbox.org>
Date:   Fri Dec 18 10:57:27 2020 +0100

    Add a revision cooker into bare Git repository
    
    This cooker handles a single revision history up to a revision id.
    It outputs a Git bare repository of the revision, directory and file
    objects, packed into a tarball.
    
    This implementation uses the dulwich library to handle git objects.
    It writes progressively a disk-backed repository in a temporary
    location before outputting the tarball info the destination in-memory
    file.
    As dulwich tree objects can only be created from the leaves (files)
    to the root, the tree-processing algorithm iterates in a depth-first
    fashion.

commit e734b4594f3b40782a382de8eb1f9ab44cd3b78b
Author: tenma <tenma+swh@mailbox.org>
Date:   Wed Feb 3 00:14:43 2021 +0100

    cookers: Update cooker register with revision_bare one

commit ab21eced5ddb1c8eddd98625202e5b03956d3366
Author: tenma <tenma+swh@mailbox.org>
Date:   Tue Feb 2 19:39:21 2021 +0100

    to_disk: fix typing to actually accepted type

See https://jenkins.softwareheritage.org/job/DVAU/job/tests-on-diff/64/ for more details.

tenma retitled this revision from WIP git bare revision cooker to [WIP] git bare revision cooker.Feb 3 2021, 12:30 PM
tenma edited the summary of this revision. (Show Details)
tenma edited the test plan for this revision. (Show Details)
tenma added a project: Vault.
tenma edited the summary of this revision. (Show Details)
vlorentz added a reviewer: tenma.
vlorentz added a subscriber: vlorentz.

I'm going to close this, because we decided to use a different approach for the git bare cooker.