Page MenuHomeSoftware Heritage

storage.db: Drop db.origin_visit_upsert behavior
ClosedPublic

Authored by ardumont on Jul 3 2020, 4:34 PM.

Details

Summary

The initial desired behavior was to allow creation of origin-visit if they
already had their id set. This is the what's needed for the replayer to actually
work.

But somehow, this left the possibility to update the origin-visit...
This commit fixes it by dropping conflictual origin-visits if any.
In effect, we can no longer overwrite origin-visits (pg-storage wise).

Related to T2310

Test Plan

tox

Diff Detail

Repository
rDSTO Storage manager
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 13332
Build 20371: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 20370: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D3416 (id=12111)

Rebasing onto 248c277445...

First, rewinding head to replay your work on top of it...
Applying: storage.db: Drop db.origin_visit_upsert behavior
Changes applied before test
commit 0b1e32a45ce1eae6dfaa09c1185715b9b8b44299
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Jul 3 16:31:22 2020 +0200

    storage.db: Drop db.origin_visit_upsert behavior
    
    The initial desired behavior was to allow creation of origin-visit if they
    already have their id set. This is what's needed for the replayer.
    
    So now, this will try to add origin-visit and drop conflictual origin-visits.
    
    Related to T2310

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/406/ for more details.

  • rebase
  • rework commit message
This revision is now accepted and ready to land.Jul 3 2020, 4:44 PM

Build is green

Patch application report for D3416 (id=12112)

Rebasing onto 248c277445...

Current branch diff-target is up to date.
Changes applied before test
commit 348bc7b2204f047ef01cc231aa34fbdbfba98472
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Jul 3 16:31:22 2020 +0200

    storage.db: Drop db.origin_visit_upsert behavior
    
    The initial desired behavior was to allow creation of origin-visit if they
    already had their id set. This is the what's needed for the replayer to actually
    work.
    
    But somehow, this left the possibility to update the origin-visit...
    This commit fixes it by dropping conflictual origin-visits if any.
    In effect, we can no longer overwrite origin-visits (pg-storage wise).
    
    Related to T2310

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/407/ for more details.

The change by itself looks fine, but the cursor manipulation logic is not sparking joy :-)

swh/storage/db.py
487–488

This call should be using the cursor argument.

490

This needs to be moved above the self.origin_id_get_by_url call

swh/storage/storage.py
825

Missing cursor argument

swh/storage/db.py
487–488

Funny thing, if i'm calling the method with the "right" cur, i got the following error P715
(initial cur is passed from storage as well).

Without calling that method with cur, it's fine.

swh/storage/db.py
487–488

D3420 overcomes this by dropping this call altogether...