Page MenuHomeSoftware Heritage

Fix revision cooking errors with the vault for large revision log
ClosedPublic

Authored by anlambert on Jul 26 2019, 5:26 PM.

Details

Summary

This fixes a vault issue similar to a one that happened when cooking a directory (T1177).

When cooking a gitfast archive for a given revision, fetching the associated revision log
must be performed client-side in a paginated way to avoid storage timeouts when the total
number of revisions to fetch is pretty large.

Related T1934

Diff Detail

Repository
rDVAU Software Heritage Vault
Branch
fix-revision-cooking-timeout
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 7056
Build 9938: tox-on-jenkinsJenkins
Build 9937: arc lint + arc unit

Event Timeline

anlambert retitled this revision from Fix revision cooking errors in the vault for large revision log to Fix revision cooking errors with the vault for large revision log.Jul 26 2019, 5:28 PM

Update: Add missing Yields section in docstring

ardumont added inline comments.
swh/vault/cookers/utils.py
28

Wouldn't that bubble up (since you increment it below) when the revisions are numerous?

I'd call that function with max_revs=per_page.

What did i miss?

For the sake of my question ;)

Other than that sounds good.

This revision now requires changes to proceed.Jul 26 2019, 6:19 PM
swh/vault/cookers/utils.py
28

Nope, it will work as expected.

get_revisions_walker instantiates a revision iterator, the important part here is the state parameter enabling to continue the iteration from where you left it the last time you used it.

As I can't guess the number of revisions to walk, I am forced to continually increment the max_revs parameter. But the number of revisions returned at each iteration step will be per_page or less if we hit the initial revision in the log.

swh/vault/cookers/utils.py
28

But the number of revisions returned at each iteration step will be per_page or less if we hit the initial revision in the log.

But the number of revisions returned by each instantiated iterator will be per_page or less if we hit the initial revision in the log.

This is clearer ;-)

ardumont added inline comments.
swh/vault/cookers/utils.py
28

Ok, thanks for the clarification.

(I remembered the state was important ;)

This revision is now accepted and ready to land.Jul 26 2019, 8:37 PM
douardda added a subscriber: douardda.

I'm not against this diff as quick solution for the problem, but I'm disappointed that this pagination stuff is not properly managed in the (client part) storage itself. What the point of having a client implementation of the storage if we do not abstract these kind of 'basic' behaviors?

swh/vault/cookers/utils.py
33

It would be nice to add a small comment explaining this "stop condition".

Update: Add comments in revision_log implementation

This revision was automatically updated to reflect the committed changes.