Page MenuHomeSoftware Heritage

Fix revision cooking errors with the vault for large revision log
ClosedPublic

Authored by anlambert on Fri, Jul 26, 5:26 PM.

Details

Summary

This fixes a vault issue similar to a one that happened when cooking a directory (T1177).

When cooking a gitfast archive for a given revision, fetching the associated revision log
must be performed client-side in a paginated way to avoid storage timeouts when the total
number of revisions to fetch is pretty large.

Related T1934

Diff Detail

Repository
rDVAU Software Heritage Vault
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

anlambert created this revision.Fri, Jul 26, 5:26 PM
anlambert retitled this revision from Fix revision cooking errors in the vault for large revision log to Fix revision cooking errors with the vault for large revision log.Fri, Jul 26, 5:28 PM
anlambert updated this revision to Diff 6006.Fri, Jul 26, 5:33 PM

Update: Add missing Yields section in docstring

ardumont added inline comments.
swh/vault/cookers/utils.py
29

Wouldn't that bubble up (since you increment it below) when the revisions are numerous?

I'd call that function with max_revs=per_page.

What did i miss?

ardumont requested changes to this revision.Fri, Jul 26, 6:19 PM

For the sake of my question ;)

Other than that sounds good.

This revision now requires changes to proceed.Fri, Jul 26, 6:19 PM
anlambert added inline comments.Fri, Jul 26, 7:01 PM
swh/vault/cookers/utils.py
29

Nope, it will work as expected.

get_revisions_walker instantiates a revision iterator, the important part here is the state parameter enabling to continue the iteration from where you left it the last time you used it.

As I can't guess the number of revisions to walk, I am forced to continually increment the max_revs parameter. But the number of revisions returned at each iteration step will be per_page or less if we hit the initial revision in the log.

anlambert added inline comments.Fri, Jul 26, 7:05 PM
swh/vault/cookers/utils.py
29

But the number of revisions returned at each iteration step will be per_page or less if we hit the initial revision in the log.

But the number of revisions returned by each instantiated iterator will be per_page or less if we hit the initial revision in the log.

This is clearer ;-)

ardumont accepted this revision.Fri, Jul 26, 8:37 PM
ardumont added inline comments.
swh/vault/cookers/utils.py
29

Ok, thanks for the clarification.

(I remembered the state was important ;)

This revision is now accepted and ready to land.Fri, Jul 26, 8:37 PM
douardda accepted this revision.Mon, Jul 29, 10:17 AM
douardda added a subscriber: douardda.

I'm not against this diff as quick solution for the problem, but I'm disappointed that this pagination stuff is not properly managed in the (client part) storage itself. What the point of having a client implementation of the storage if we do not abstract these kind of 'basic' behaviors?

swh/vault/cookers/utils.py
34

It would be nice to add a small comment explaining this "stop condition".

anlambert updated this revision to Diff 6017.Mon, Jul 29, 11:14 AM

Update: Add comments in revision_log implementation

This revision was automatically updated to reflect the committed changes.