Page MenuHomeSoftware Heritage

storage.db: Drop db.origin_visit_upsert behavior
ClosedPublic

Authored by ardumont on Jul 3 2020, 4:34 PM.

Details

Summary

The initial desired behavior was to allow creation of origin-visit if they
already had their id set. This is the what's needed for the replayer to actually
work.

But somehow, this left the possibility to update the origin-visit...
This commit fixes it by dropping conflictual origin-visits if any.
In effect, we can no longer overwrite origin-visits (pg-storage wise).

Related to T2310

Test Plan

tox

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D3416 (id=12111)

Rebasing onto 248c277445...

First, rewinding head to replay your work on top of it...
Applying: storage.db: Drop db.origin_visit_upsert behavior
Changes applied before test
commit 0b1e32a45ce1eae6dfaa09c1185715b9b8b44299
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Jul 3 16:31:22 2020 +0200

    storage.db: Drop db.origin_visit_upsert behavior
    
    The initial desired behavior was to allow creation of origin-visit if they
    already have their id set. This is what's needed for the replayer.
    
    So now, this will try to add origin-visit and drop conflictual origin-visits.
    
    Related to T2310

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/406/ for more details.

  • rebase
  • rework commit message
This revision is now accepted and ready to land.Jul 3 2020, 4:44 PM

Build is green

Patch application report for D3416 (id=12112)

Rebasing onto 248c277445...

Current branch diff-target is up to date.
Changes applied before test
commit 348bc7b2204f047ef01cc231aa34fbdbfba98472
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Jul 3 16:31:22 2020 +0200

    storage.db: Drop db.origin_visit_upsert behavior
    
    The initial desired behavior was to allow creation of origin-visit if they
    already had their id set. This is the what's needed for the replayer to actually
    work.
    
    But somehow, this left the possibility to update the origin-visit...
    This commit fixes it by dropping conflictual origin-visits if any.
    In effect, we can no longer overwrite origin-visits (pg-storage wise).
    
    Related to T2310

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/407/ for more details.

The change by itself looks fine, but the cursor manipulation logic is not sparking joy :-)

swh/storage/db.py
487–488

This call should be using the cursor argument.

490

This needs to be moved above the self.origin_id_get_by_url call

swh/storage/storage.py
825

Missing cursor argument

swh/storage/db.py
487

Funny thing, if i'm calling the method with the "right" cur, i got the following error P715
(initial cur is passed from storage as well).

Without calling that method with cur, it's fine.

swh/storage/db.py
487

D3420 overcomes this by dropping this call altogether...