Page MenuHomeSoftware Heritage

Ensure there is no duplicated origins in the insertion batches
ClosedPublic

Authored by vsellier on Dec 6 2021, 4:35 PM.

Details

Summary

when a lister tries to insert duplicate origins in the same batch,
the insertion is failing because the "on cascade do update" instruction
cannot manage duplicates in the same transaction.

This commit drops the duplicate entries prior to actually insert.

Related to T3769

Diff Detail

Repository
rDSCH Scheduling utilities
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 25403
Build 39711: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 39710: arc lint + arc unit

Event Timeline

swh/scheduler/backend.py
267

Note to the reviewers: Which version do you prefer ?

Build is green

Patch application report for D6753 (id=24517)

Rebasing onto 2abb393684...

Current branch diff-target is up to date.
Changes applied before test
commit 15ee14b7fdc1d51953a9ee4ec53feef9f5448108
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Mon Dec 6 16:23:49 2021 +0100

    Ensure there is no duplicated origins in the insertion batches
    
    when a lister try to insert duplicate origins in the same batch,
    the insertion is failing because the "on cascade do update" instruction
    cannot manage duplicates in the same transaction
    
    Related to T3769

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/496/ for more details.

ardumont edited the summary of this revision. (Show Details)
ardumont added subscribers: olasd, ardumont.

lgtm

@olasd, is this what you had in mind?

This revision is now accepted and ready to land.Dec 6 2021, 5:30 PM
swh/scheduler/backend.py
267

I do tend to prefer dict-comprehension (list, ...) when it's readable.
I find it readable enough here but I suggested you that one, so i'm biased ;)

LGTM, thanks!

swh/scheduler/backend.py
267

Yeah, I think the dict comprehension is fine.

I do wonder if we should be "merging" the objects instead of having them stomp on one another, but I guess the stomping is what the SQL does anyway, so that implem is fine.

268

You can drop the brackets here.

keep the dict comprehension version

Build is green

Patch application report for D6753 (id=24529)

Rebasing onto 2abb393684...

Current branch diff-target is up to date.
Changes applied before test
commit 0a6aac583adff2c55069c9da676ad95670e35708
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Mon Dec 6 16:23:49 2021 +0100

    Ensure there is no duplicated origins in the insertion batches
    
    when a lister try to insert duplicate origins in the same batch,
    the insertion is failing because the "on cascade do update" instruction
    cannot manage duplicates in the same transaction
    
    Related to T3769

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/497/ for more details.