Page MenuHomeSoftware Heritage

Add a new origin_visit_get_latest endpoint.
ClosedPublic

Authored by vlorentz on Jun 13 2019, 4:33 PM.

Details

Summary

snapshot_get_latest did two things: find the latest origin, and get its snapshot.
Now, it only calls origin_visit_get_latest, then snapshot_get.

snapshot_get_latest should also be deprecated, because it only fetches some of
the branches (for large snapshots).

Finally, with the new data model (types are associated to visits, not origins),
origin_visit_get_latest is required to know the type of the loader that
got the snapshot (needed in swh-indexer).

Also, note that origin_visit_get_latest only supports URLs as argument, not
origin ids. Since we are migrating away from origin ids, I don't think
it makes sense to add support for origin ids in a new endpoint.

Diff Detail

Repository
rDSTO Storage manager
Branch
origin_visit_get_latest
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 6207
Build 8567: tox-on-jenkinsJenkins
Build 8566: arc lint + arc unit

Event Timeline

douardda added a subscriber: douardda.

LGTM but please fix the abstract class's docstring as stated in the comment above.

swh/storage/db.py
434–438

This doc string is unclear and misleading IMHO.

This is not what this method does. If I understand correctly the code, it retrieve the most recent origin visit for a given origin, potentially limited to visits with a related snapshot.

Written as it is (and was before this diff, which is a good moment to fix this), I understand it looks for visits for a given snapshot.

Just noticed docstrings in subclasses seem more accurate in this regard.

We really should not have to duplicate those docstrings everywhere, it's a pain in the neck to manage...

462

I'm a noob in sql, but isn't there a JOIN variant with which this condition is not needed?

This revision now requires changes to proceed.Jun 14 2019, 10:41 AM
swh/storage/db.py
462

that's indeed not needed for an inner join. I'm not sure why it was there.

swh/storage/db.py
462

It turns out that the inner join shouldn't be there. I'll remove it in another diff (because it changes the behavior of the function)

This revision is now accepted and ready to land.Jun 14 2019, 1:27 PM
This revision was landed with ongoing or failed builds.Jun 14 2019, 1:29 PM
This revision was automatically updated to reflect the committed changes.