Page MenuHomeSoftware Heritage

Split Storage.snapshot_add (add the snapshot + update the origin_visit)
ClosedPublic

Authored by vlorentz on Mar 28 2019, 1:48 PM.

Details

Summary

snapshot_add(origin, visit, snap) becomes snapshot_add(snap) +
origin_visit_update(origin, visit, snapshot_id=snap['id'])

Also, all arguments of origin_visit_update become optional; ie. the old value in the DB remains if their value is None. (Side-effect: you can't erase a value unless you provide a new non-None value.)

I added a temporary compatibility layer, so the API change can go painlessly
by updating the clients after the server is deployed.

Diff Detail

Repository
rDSTO Storage manager
Branch
split-snapshot_add
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 4930
Build 6594: tox-on-jenkinsJenkins
Build 6593: arc lint + arc unit

Event Timeline

vlorentz edited the summary of this revision. (Show Details)
  • Fix doc of the in-mem storage.
olasd requested changes to this revision.Mar 28 2019, 2:13 PM
olasd added a subscriber: olasd.

Rather than jump through hoops and duplicate the visit update code in the legacy codepaths in snapshot_add, I think you could just have them call origin_visit_update after adding the snapshot.

(This changes the behavior a little as adding a snapshot to a non-existent visit will fail after adding the snapshot, but that's not really an issue, and it'll make for some easier to follow code IMO).

This revision now requires changes to proceed.Mar 28 2019, 2:13 PM
In D1314#28207, @olasd wrote:

Rather than jump through hoops and duplicate the visit update code in the legacy codepaths in snapshot_add, I think you could just have them call origin_visit_update after adding the snapshot.

That's already what I do

(This changes the behavior a little as adding a snapshot to a non-existent visit will fail after adding the snapshot, but that's not really an issue, and it'll make for some easier to follow code IMO).

Ok, I'll do that

  • Remove the compatibility code that checks for visit existence from snapshot_add.
olasd requested changes to this revision.Mar 28 2019, 2:55 PM

A few nits left but the rest of the code looks fine to me :)

swh/storage/storage.py
1081–1098

Looks like you're still fetching the visit twice (L1081 and L1091).

I think the first if statement can go away and we can just always fetch the full visit metadata (it's only one more join, that we're doing most of the time anyway).

That way we:

  • always get the full object we can send to the journal writer
  • can avoid sending an update if the fields haven't been updated (that'd be an "if updates:" before doing both the write_update and db.origin_visit_update)
swh/storage/tests/test_storage.py
1645

Missing a snapshot_id kwarg specification here?

1666–1667

AFAICT the code path is the same now?

This revision now requires changes to proceed.Mar 28 2019, 2:55 PM
vlorentz marked 3 inline comments as done.
  • s/snapshot_id/snapshot/ for consistency + improve call of origin_visit_update from snapshot_add.
olasd added inline comments.
swh/storage/storage.py
1097

I guess you could call the journal writer there rather than earlier but that's not terribly critical.

This revision is now accepted and ready to land.Mar 28 2019, 3:46 PM
  • Write to the journal only if there are updates.
vlorentz added inline comments.
swh/storage/tests/test_storage.py
1645

yup.

1666–1667

yup.

This revision was landed with ongoing or failed builds.Mar 28 2019, 4:08 PM
This revision was automatically updated to reflect the committed changes.