Page MenuHomeSoftware Heritage

package.loader: Reference a snapshot on partial visit
ClosedPublic

Authored by ardumont on Mar 20 2020, 11:59 AM.

Details

Summary

Prior to this commit, partial visit would not reference a partial snapshot.
Thus further visits on the same origin would not benefit from the previous snapshot's
already seen artifacts. This now improves on such behavior to reference such
partial snapshot.

For the deposit's case though, we want to fail immediately when a deposit id is
unknown. This avoids creating noises in the storage.

Diff Detail

Repository
rDLDBASE Generic VCS/Package Loader
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 11247
Build 16989: tox-on-jenkinsJenkins
Build 16988: arc lint + arc unit

Event Timeline

swh/loader/package/deposit/loader.py
101

This is not clear (it's a property), how to do better?

swh/loader/package/deposit/tests/test_deposit.py
55

No more noise in the storage! \m/

swh/loader/package/pypi/tests/test_pypi.py
238

I had to change the raising error because default version is now called when building the snapshot.

ardumont added inline comments.
swh/loader/package/loader.py
308

^ D2769 makes me wonder...

vlorentz added inline comments.
swh/loader/package/deposit/loader.py
101

Make metadata should probably a regular method instead of a property if it does non-trivial stuff.

And why doesn't the constructor load metadata itself, as the attribute is always used?

ardumont added inline comments.
swh/loader/package/deposit/loader.py
101

Make metadata should probably a regular method instead of a property if it does non-trivial stuff.

yes, that'd make sense to do that now.

And why doesn't the constructor load metadata itself, as the attribute is always used?

avoiding side-effect in the constructor.
keep initialization simple and sane

loader-deposit: Clarify intent about the deposit metadata retrieval step

package.loader: Make the override methods being called in the main loop

Here, i'm specifically targetting self.extra_branches call

swh/loader/package/loader.py
404

It'd be great the types (revisions. extra_branches) were a tad unified here.
But let's keep that for another time.

swh/loader/package/loader.py
307

I think we could be in trouble if this method raises an exception. So, maybe we could instead get extra_branches at the beginning of the load method, before loading any revisions: we are then sure to have extra_branches and we are able to catch exceptions of this method.

swh/loader/package/loader.py
307

Let's make the functional loader resilient to this and not the loader one.
Since there is only this one which does implement it.

(i think try except within its own implementation).

Same goes for default version by the way.

Explicit default_version and extra_branches implementations must not raise
exceptions.

swh/loader/package/loader.py
308

nvm that comment now, it's no longer relevant.

This feels like a step in the right direction, but I suggest we go even further.

I don't think the This method *must* not raise exception. comments are very useful. What happens if they do?

The finally block is here so we maximize the chances of updating the origin_visit after loading; In effect, I guess we want to try to flush the storage no matter what too.

So, instead of bailing out on the first exception, then trying to do the snapshot stuff, I suggest that we move the exception handling inside the loop, and try to load as much as we can, bailing out only when we don't have a choice, rather than on the first issue.

I'm submitting a diff to that effect. Tests will be a little bit broken because the behavior change is a bit more drastic, but I think this is the better way out.

swh/loader/package/loader.py
405–406

Isn't there some edge cases where generating an empty snapshot is the right thing to do?

I don't think the This method *must* not raise exception. comments are very useful. What happens if they do?

It should not pass review in the first place.
In my mind, that would have mean the new loader with a get_default_version and extra_branches must have a try: catch stanza in their implementation.
Then again, i'm trying to be incremental in my approach.

Check whatever happens in staging area and adapt after that.

The finally block is here so we maximize the chances of updating the origin_visit after loading; In effect, I guess we want to try to flush the storage no matter what too.

yes, we do.
And same for the snapshot, thus the initial change here.

So, instead of bailing out on the first exception, then trying to do the snapshot stuff, I suggest that we move the exception handling inside the loop, and try to load as much as we can, bailing out only when we don't have a choice, rather than on the first issue.

That's what lewo is trying to reach with the functional loader.
(one diff at a time, e.g D2859)

I'm submitting a diff to that effect. Tests will be a little bit broken because the behavior change is a bit more drastic, but I think this is the better way out.

ack, thx.

Related to D2862

swh/loader/package/loader.py
405–406

no idea.

That'd simplify but i don't know what's better.

  • Make empty snapshot
  • Remove *must* not docstrings
swh/loader/package/pypi/tests/test_pypi.py
256

This goes away in D2862

This revision is now accepted and ready to land.Mar 23 2020, 11:27 AM