Page MenuHomeSoftware Heritage

Add a new origin_visit_get_latest endpoint.
ClosedPublic

Authored by vlorentz on Jun 13 2019, 4:33 PM.

Details

Summary

snapshot_get_latest did two things: find the latest origin, and get its snapshot.
Now, it only calls origin_visit_get_latest, then snapshot_get.

snapshot_get_latest should also be deprecated, because it only fetches some of
the branches (for large snapshots).

Finally, with the new data model (types are associated to visits, not origins),
origin_visit_get_latest is required to know the type of the loader that
got the snapshot (needed in swh-indexer).

Also, note that origin_visit_get_latest only supports URLs as argument, not
origin ids. Since we are migrating away from origin ids, I don't think
it makes sense to add support for origin ids in a new endpoint.

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

vlorentz created this revision.Jun 13 2019, 4:33 PM
douardda requested changes to this revision.Jun 14 2019, 10:41 AM
douardda added a subscriber: douardda.

LGTM but please fix the abstract class's docstring as stated in the comment above.

swh/storage/db.py
434–438

This doc string is unclear and misleading IMHO.

This is not what this method does. If I understand correctly the code, it retrieve the most recent origin visit for a given origin, potentially limited to visits with a related snapshot.

Written as it is (and was before this diff, which is a good moment to fix this), I understand it looks for visits for a given snapshot.

Just noticed docstrings in subclasses seem more accurate in this regard.

We really should not have to duplicate those docstrings everywhere, it's a pain in the neck to manage...

463

I'm a noob in sql, but isn't there a JOIN variant with which this condition is not needed?

This revision now requires changes to proceed.Jun 14 2019, 10:41 AM
vlorentz added inline comments.Jun 14 2019, 10:44 AM
swh/storage/db.py
463

that's indeed not needed for an inner join. I'm not sure why it was there.

vlorentz added inline comments.Jun 14 2019, 10:53 AM
swh/storage/db.py
463

It turns out that the inner join shouldn't be there. I'll remove it in another diff (because it changes the behavior of the function)

vlorentz updated this revision to Diff 5233.Jun 14 2019, 10:53 AM

fix docstring

douardda accepted this revision.Jun 14 2019, 1:27 PM
This revision is now accepted and ready to land.Jun 14 2019, 1:27 PM
This revision was landed with ongoing or failed builds.Jun 14 2019, 1:29 PM
This revision was automatically updated to reflect the committed changes.