Page MenuHomeSoftware Heritage

package.loader: Reference a snapshot on partial visit
ClosedPublic

Authored by ardumont on Fri, Mar 20, 11:59 AM.

Details

Summary

Prior to this commit, partial visit would not reference a partial snapshot.
Thus further visits on the same origin would not benefit from the previous snapshot's
already seen artifacts. This now improves on such behavior to reference such
partial snapshot.

For the deposit's case though, we want to fail immediately when a deposit id is
unknown. This avoids creating noises in the storage.

Diff Detail

Repository
rDLDBASE Generic VCS/Package Loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ardumont created this revision.Fri, Mar 20, 11:59 AM
ardumont added inline comments.Fri, Mar 20, 12:00 PM
swh/loader/package/deposit/loader.py
102

This is not clear (it's a property), how to do better?

swh/loader/package/deposit/tests/test_deposit.py
55

No more noise in the storage! \m/

swh/loader/package/pypi/tests/test_pypi.py
239

I had to change the raising error because default version is now called when building the snapshot.

ardumont added inline comments.
swh/loader/package/loader.py
309

^ D2769 makes me wonder...

vlorentz added inline comments.
swh/loader/package/deposit/loader.py
102

Make metadata should probably a regular method instead of a property if it does non-trivial stuff.

And why doesn't the constructor load metadata itself, as the attribute is always used?

ardumont edited the summary of this revision. (Show Details)Fri, Mar 20, 1:32 PM
ardumont added inline comments.
swh/loader/package/deposit/loader.py
102

Make metadata should probably a regular method instead of a property if it does non-trivial stuff.

yes, that'd make sense to do that now.

And why doesn't the constructor load metadata itself, as the attribute is always used?

avoiding side-effect in the constructor.
keep initialization simple and sane

ardumont updated this revision to Diff 10178.Fri, Mar 20, 1:41 PM

loader-deposit: Clarify intent about the deposit metadata retrieval step

ardumont updated this revision to Diff 10179.Fri, Mar 20, 1:53 PM

package.loader: Make the override methods being called in the main loop

Here, i'm specifically targetting self.extra_branches call

ardumont added inline comments.Fri, Mar 20, 2:15 PM
swh/loader/package/loader.py
409

It'd be great the types (revisions. extra_branches) were a tad unified here.
But let's keep that for another time.

lewo added inline comments.Fri, Mar 20, 2:46 PM
swh/loader/package/loader.py
308

I think we could be in trouble if this method raises an exception. So, maybe we could instead get extra_branches at the beginning of the load method, before loading any revisions: we are then sure to have extra_branches and we are able to catch exceptions of this method.

ardumont added inline comments.Fri, Mar 20, 3:29 PM
swh/loader/package/loader.py
308

Let's make the functional loader resilient to this and not the loader one.
Since there is only this one which does implement it.

(i think try except within its own implementation).

Same goes for default version by the way.

ardumont updated this revision to Diff 10185.Fri, Mar 20, 3:37 PM

Explicit default_version and extra_branches implementations must not raise
exceptions.

swh/loader/package/loader.py
309

nvm that comment now, it's no longer relevant.

olasd added a subscriber: olasd.Fri, Mar 20, 6:08 PM

This feels like a step in the right direction, but I suggest we go even further.

I don't think the This method *must* not raise exception. comments are very useful. What happens if they do?

The finally block is here so we maximize the chances of updating the origin_visit after loading; In effect, I guess we want to try to flush the storage no matter what too.

So, instead of bailing out on the first exception, then trying to do the snapshot stuff, I suggest that we move the exception handling inside the loop, and try to load as much as we can, bailing out only when we don't have a choice, rather than on the first issue.

I'm submitting a diff to that effect. Tests will be a little bit broken because the behavior change is a bit more drastic, but I think this is the better way out.

swh/loader/package/loader.py
414–415

Isn't there some edge cases where generating an empty snapshot is the right thing to do?

ardumont added a comment.EditedFri, Mar 20, 6:24 PM

I don't think the This method *must* not raise exception. comments are very useful. What happens if they do?

It should not pass review in the first place.
In my mind, that would have mean the new loader with a get_default_version and extra_branches must have a try: catch stanza in their implementation.
Then again, i'm trying to be incremental in my approach.

Check whatever happens in staging area and adapt after that.

The finally block is here so we maximize the chances of updating the origin_visit after loading; In effect, I guess we want to try to flush the storage no matter what too.

yes, we do.
And same for the snapshot, thus the initial change here.

So, instead of bailing out on the first exception, then trying to do the snapshot stuff, I suggest that we move the exception handling inside the loop, and try to load as much as we can, bailing out only when we don't have a choice, rather than on the first issue.

That's what lewo is trying to reach with the functional loader.
(one diff at a time, e.g D2859)

I'm submitting a diff to that effect. Tests will be a little bit broken because the behavior change is a bit more drastic, but I think this is the better way out.

ack, thx.

Related to D2862

swh/loader/package/loader.py
414–415

no idea.

That'd simplify but i don't know what's better.

ardumont updated this revision to Diff 10195.Mon, Mar 23, 9:23 AM
  • Make empty snapshot
  • Remove *must* not docstrings
ardumont added inline comments.Mon, Mar 23, 11:18 AM
swh/loader/package/pypi/tests/test_pypi.py
258

This goes away in D2862

olasd accepted this revision.Mon, Mar 23, 11:27 AM
This revision is now accepted and ready to land.Mon, Mar 23, 11:27 AM