Page MenuHomeSoftware Heritage

storage: origin_visit_add: Type and make origin_visit_add return an OriginVisit object
ClosedPublic

Authored by ardumont on Mar 12 2020, 2:21 AM.

Details

Summary

origin_visit_add: Adapt endpoint signature to return OriginVisit

Prior to this commit, there was:

  • no signature in the method
  • discrepancy between checks on the different backend

origin_visit_add endpoint is now typed

def origin_visit_add(
    self, origin_url: str,
    date: Union[str, datetime.datetime], type: str) -> OriginVisit:

This also:

  • renames appropriately the origin_url parameter (removing 1 FIXME)
  • align backend implementations' check which were different

Impacts:

  • D2828: swh-loader-core (runtime)
  • D2829: swh-loader-svn (runtime)
  • D2830: swh-web (tests)
  • D2831: swh-indexer (tests)
Test Plan

tox

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ardumont edited the summary of this revision. (Show Details)Mar 12 2020, 1:45 PM
ardumont updated this revision to Diff 10013.Mar 12 2020, 2:21 PM

Simplify the diff, i'll keep only the input change to base model

I'll do other diffs to simplify the signature so the diffs remain small

ardumont edited the summary of this revision. (Show Details)Mar 12 2020, 2:21 PM
ardumont edited the summary of this revision. (Show Details)Mar 12 2020, 2:25 PM
ardumont edited the summary of this revision. (Show Details)Mar 12 2020, 2:31 PM
vlorentz requested changes to this revision.Mar 12 2020, 2:33 PM
vlorentz added a subscriber: vlorentz.

Early comments on the premise of your diff; I didn't fully review the diff yet:

swh/storage/interface.py
773–775

I don't think origin should be an Origin object. Here it's used as ID, not as complete node.

774

If you're willing to break compatibility, you might as well remove support for str.

779–785

needs an update

798

Same comment as above, it doesn't make sense to pass a complete Origin object, let alone a complete OriginVisit object.

802–803

needs an update.

This revision now requires changes to proceed.Mar 12 2020, 2:33 PM
ardumont added inline comments.Mar 12 2020, 2:42 PM
swh/storage/interface.py
773–775

Except that when we pass other objects, they are complete.
I don't see why we don't do the same thing here.

774

I'm doing this iteratively but yes later.

798

I agree for origin, thus why it goes away in a diff i thought i submitted ;)

ardumont edited the summary of this revision. (Show Details)Mar 12 2020, 2:50 PM
vlorentz added inline comments.Mar 12 2020, 3:09 PM
swh/storage/interface.py
773–775

But here, we're not passing an object but the id of an object.

It would be like passing an entire snapshot to origin_visit_update instead of the snapshot's id.

ardumont added inline comments.Mar 12 2020, 3:12 PM
swh/storage/interface.py
798

i submitted it but i did not find it immediately for some reason D2822

ardumont planned changes to this revision.Mar 12 2020, 3:46 PM
ardumont retitled this revision from storage: origin_visit_add/update: Migrate to base model input to storage: origin_visit_add: Migrate to base model input.
ardumont edited the summary of this revision. (Show Details)

Plan changes as i need to remove the part which touches origin_visit_add endpoints :)

I'll probably merge D2821 here as well.

ardumont updated this revision to Diff 10039.Mar 12 2020, 8:13 PM
ardumont edited the summary of this revision. (Show Details)

Only modify origin_visit_add endpoint

ardumont updated this revision to Diff 10040.Mar 12 2020, 8:16 PM
  • origin_visit_add: Change return type to OriginVisit
ardumont retitled this revision from storage: origin_visit_add: Migrate to base model input to storage: origin_visit_add: Use swh.model as input and outputMigrate to base model input.Mar 12 2020, 8:20 PM
ardumont edited the summary of this revision. (Show Details)
ardumont edited the summary of this revision. (Show Details)
ardumont retitled this revision from storage: origin_visit_add: Use swh.model as input and outputMigrate to base model input to storage: origin_visit_add: Use swh.model as input and output.Mar 13 2020, 10:03 AM
ardumont added inline comments.Mar 13 2020, 10:13 AM
swh/storage/interface.py
773–775

It would be like passing an entire snapshot to origin_visit_update instead of the snapshot's id.

yes and that'd be more consistent with i do here indeed.


Overall, i think i sense what you mean.

But somehow i found it best we pass a compliant object Origin (when here, client created the origin properly, as returned from "origin_add*" mostly likely).

Then also, passing the object makes the implementation detail of the id hidden from the user.
That qualifies as opaque to me.

When we pass the string, it's from my point of view less so.

Also i understand you might be bothered by this because i recall you did the opposite job a while ago (when origin were dict, you changed it with its string url here...).

Hopefully, we'll converge on something.

From a test here point of view, it's annoying to create origin to call this method (some tests do...).
But i think most of our tests here are not that well written (because we need to make a refactoring pass at some point) so somehow that must be kept out of the reasoning loop here.

From a "real" client point of view (loader), i found it simpler to pass along the origin object just created.
Then let the storage deal with the pk itself.
This, instead of unwraping the origin when calling origin_visit....

ardumont added inline comments.Mar 13 2020, 11:17 AM
swh/storage/interface.py
773–775

Apparently i'm the only one annoyed by this. So let's keep it then.

Maybe I'll propose an id alias on the origin at some point.
I don't find "url" clear that it's the id of the origin.

ardumont planned changes to this revision.Mar 13 2020, 11:17 AM
ardumont updated this revision to Diff 10044.Mar 13 2020, 1:30 PM

Type origin_visit_add to return origin visit

(origin_url: str, date: Union[str, datetime.datetime], type: str) -> OriginVisit

ardumont retitled this revision from storage: origin_visit_add: Use swh.model as input and output to storage: origin_visit_add: Type and make origin_visit_add return an OriginVisit object.Mar 13 2020, 1:33 PM
ardumont edited the summary of this revision. (Show Details)
vlorentz requested changes to this revision.Mar 13 2020, 1:45 PM

good, just a few more comments

swh/storage/tests/test_storage.py
1569–1570

rename to origin_visit.

1573

rename to expected_origin_visit, and assert it's equal to what origin_visit_add returned.

2502–2509

Why create a complete object if you're just using the id?

This revision now requires changes to proceed.Mar 13 2020, 1:45 PM
ardumont edited the summary of this revision. (Show Details)Mar 13 2020, 2:03 PM
ardumont updated this revision to Diff 10045.Mar 13 2020, 2:04 PM
ardumont edited the summary of this revision. (Show Details)

Adapt according to review

swh/storage/tests/test_storage.py
2502–2509

that's a left over from the gazillion rebases i did since the last 2 days ;)

ardumont updated this revision to Diff 10046.Mar 13 2020, 2:09 PM

Add missing assertion check

vlorentz accepted this revision.Mar 13 2020, 2:10 PM
This revision is now accepted and ready to land.Mar 13 2020, 2:10 PM
ardumont edited the summary of this revision. (Show Details)Mar 14 2020, 11:31 AM
ardumont edited the summary of this revision. (Show Details)Mar 14 2020, 11:38 AM
ardumont edited the summary of this revision. (Show Details)Mar 14 2020, 11:42 AM