Page MenuHomeSoftware Heritage

storage: origin_visit_add: Type and make origin_visit_add return an OriginVisit object
ClosedPublic

Authored by ardumont on Mar 12 2020, 2:21 AM.

Details

Summary

origin_visit_add: Adapt endpoint signature to return OriginVisit

Prior to this commit, there was:

  • no signature in the method
  • discrepancy between checks on the different backend

origin_visit_add endpoint is now typed

def origin_visit_add(
    self, origin_url: str,
    date: Union[str, datetime.datetime], type: str) -> OriginVisit:

This also:

  • renames appropriately the origin_url parameter (removing 1 FIXME)
  • align backend implementations' check which were different

Impacts:

  • D2828: swh-loader-core (runtime)
  • D2829: swh-loader-svn (runtime)
  • D2830: swh-web (tests)
  • D2831: swh-indexer (tests)
Test Plan

tox

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Simplify the diff, i'll keep only the input change to base model

I'll do other diffs to simplify the signature so the diffs remain small

vlorentz added a subscriber: vlorentz.

Early comments on the premise of your diff; I didn't fully review the diff yet:

swh/storage/interface.py
773

If you're willing to break compatibility, you might as well remove support for str.

773–775

I don't think origin should be an Origin object. Here it's used as ID, not as complete node.

779–785

needs an update

798

Same comment as above, it doesn't make sense to pass a complete Origin object, let alone a complete OriginVisit object.

802–803

needs an update.

This revision now requires changes to proceed.Mar 12 2020, 2:33 PM
swh/storage/interface.py
773

I'm doing this iteratively but yes later.

773–775

Except that when we pass other objects, they are complete.
I don't see why we don't do the same thing here.

798

I agree for origin, thus why it goes away in a diff i thought i submitted ;)

swh/storage/interface.py
773–775

But here, we're not passing an object but the id of an object.

It would be like passing an entire snapshot to origin_visit_update instead of the snapshot's id.

swh/storage/interface.py
798

i submitted it but i did not find it immediately for some reason D2822

ardumont retitled this revision from storage: origin_visit_add/update: Migrate to base model input to storage: origin_visit_add: Migrate to base model input.
ardumont edited the summary of this revision. (Show Details)

Plan changes as i need to remove the part which touches origin_visit_add endpoints :)

I'll probably merge D2821 here as well.

ardumont edited the summary of this revision. (Show Details)

Only modify origin_visit_add endpoint

  • origin_visit_add: Change return type to OriginVisit
ardumont retitled this revision from storage: origin_visit_add: Migrate to base model input to storage: origin_visit_add: Use swh.model as input and outputMigrate to base model input.Mar 12 2020, 8:20 PM
ardumont edited the summary of this revision. (Show Details)
ardumont edited the summary of this revision. (Show Details)
ardumont retitled this revision from storage: origin_visit_add: Use swh.model as input and outputMigrate to base model input to storage: origin_visit_add: Use swh.model as input and output.Mar 13 2020, 10:03 AM
swh/storage/interface.py
773–775

It would be like passing an entire snapshot to origin_visit_update instead of the snapshot's id.

yes and that'd be more consistent with i do here indeed.


Overall, i think i sense what you mean.

But somehow i found it best we pass a compliant object Origin (when here, client created the origin properly, as returned from "origin_add*" mostly likely).

Then also, passing the object makes the implementation detail of the id hidden from the user.
That qualifies as opaque to me.

When we pass the string, it's from my point of view less so.

Also i understand you might be bothered by this because i recall you did the opposite job a while ago (when origin were dict, you changed it with its string url here...).

Hopefully, we'll converge on something.

From a test here point of view, it's annoying to create origin to call this method (some tests do...).
But i think most of our tests here are not that well written (because we need to make a refactoring pass at some point) so somehow that must be kept out of the reasoning loop here.

From a "real" client point of view (loader), i found it simpler to pass along the origin object just created.
Then let the storage deal with the pk itself.
This, instead of unwraping the origin when calling origin_visit....

swh/storage/interface.py
773–775

Apparently i'm the only one annoyed by this. So let's keep it then.

Maybe I'll propose an id alias on the origin at some point.
I don't find "url" clear that it's the id of the origin.

Type origin_visit_add to return origin visit

(origin_url: str, date: Union[str, datetime.datetime], type: str) -> OriginVisit

ardumont retitled this revision from storage: origin_visit_add: Use swh.model as input and output to storage: origin_visit_add: Type and make origin_visit_add return an OriginVisit object.Mar 13 2020, 1:33 PM
ardumont edited the summary of this revision. (Show Details)

good, just a few more comments

swh/storage/tests/test_storage.py
1569–1570

rename to origin_visit.

1573

rename to expected_origin_visit, and assert it's equal to what origin_visit_add returned.

2501–2508

Why create a complete object if you're just using the id?

This revision now requires changes to proceed.Mar 13 2020, 1:45 PM
ardumont edited the summary of this revision. (Show Details)

Adapt according to review

swh/storage/tests/test_storage.py
2501–2508

that's a left over from the gazillion rebases i did since the last 2 days ;)

Add missing assertion check

This revision is now accepted and ready to land.Mar 13 2020, 2:10 PM