Page MenuHomeSoftware Heritage

extid: remove unicity on (extid_type, extid) and (target_type, target)
ClosedPublic

Authored by vlorentz on Mar 26 2021, 4:04 PM.

Details

Summary

It did not make sense for multiple reasons:

  1. two extids can point to the same target (eg. extids with type git and git-sha256; or two package managers with different checksums)
  2. inserting two objects with the same target or extid in a single call actually wrote both, but would crash when reading
  3. inserting extid1 then extid2 would write both to Kafka, but only extid1 would be inserted. When replaying on a new DB, extid2 may be inserted and extid1 ignored

Points 2 and 3 are simply fixable bugs, but 1 is an issue by design,
and this commit fixes all of them at once.

Diff Detail

Repository
rDSTO Storage manager
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 20293
Build 31501: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 31500: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D5363 (id=19212)

Could not rebase; Attempt merge onto eff2383717...

Updating eff23837..c006c201
Fast-forward
 sql/upgrades/173.sql               | 14 ++++++++
 swh/storage/cassandra/storage.py   | 74 ++++++++++++++++----------------------
 swh/storage/interface.py           | 10 +++---
 swh/storage/postgresql/db.py       |  2 +-
 swh/storage/postgresql/storage.py  | 24 ++++++-------
 swh/storage/sql/30-schema.sql      |  2 +-
 swh/storage/sql/60-indexes.sql     |  6 ++--
 swh/storage/tests/storage_tests.py | 72 +++++++++++++++++++++++++------------
 8 files changed, 114 insertions(+), 90 deletions(-)
 create mode 100644 sql/upgrades/173.sql
Changes applied before test
commit c006c201526e7d26b17853d8334c5bdce04ad614
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Mar 26 16:03:27 2021 +0100

    extid: remove unicity on (extid_type, extid) and (target_type, target)
    
    It did not make sense for multiple reasons:
    
    1. two extids can point to the same target (eg. extids with type git and git-sha256;
       or two package managers with different checksums)
    2. inserting two objects with the same target or extid in a single call actually
       wrote both, but would crash when reading
    3. inserting extid1 then extid2 would write both to Kafka, but only extid1
       would be inserted. When replaying on a new DB, extid2 may be inserted
       and extid1 ignored
    
    Points 2 and 3 are simply fixable bugs, but 1 is an issue by design,
    and this commit fixes all of them at once.

commit ac6f642372dd30acaf723447f202f78344255d0c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Mar 26 15:17:12 2021 +0100

    origin_visit_status_add: Fix inconsistent/incorrect errors when type is None and visit is missing.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1236/ for more details.

Build is green

Patch application report for D5363 (id=19213)

Could not rebase; Attempt merge onto eff2383717...

Updating eff23837..cfb2417f
Fast-forward
 sql/upgrades/173.sql               | 14 ++++++++
 swh/storage/cassandra/storage.py   | 74 ++++++++++++++++----------------------
 swh/storage/interface.py           | 10 +++---
 swh/storage/postgresql/db.py       |  2 +-
 swh/storage/postgresql/storage.py  | 24 ++++++-------
 swh/storage/sql/30-schema.sql      |  2 +-
 swh/storage/sql/60-indexes.sql     |  6 ++--
 swh/storage/tests/storage_tests.py | 66 +++++++++++++++++++++++-----------
 8 files changed, 111 insertions(+), 87 deletions(-)
 create mode 100644 sql/upgrades/173.sql
Changes applied before test
commit cfb2417fecfff890ec19fb60405df0c709b89fa6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Mar 26 16:03:27 2021 +0100

    extid: remove unicity on (extid_type, extid) and (target_type, target)
    
    It did not make sense for multiple reasons:
    
    1. two extids can point to the same target (eg. extids with type git and git-sha256;
       or two package managers with different checksums)
    2. inserting two objects with the same target or extid in a single call actually
       wrote both, but would crash when reading
    3. inserting extid1 then extid2 would write both to Kafka, but only extid1
       would be inserted. When replaying on a new DB, extid2 may be inserted
       and extid1 ignored
    
    Points 2 and 3 are simply fixable bugs, but 1 is an issue by design,
    and this commit fixes all of them at once.

commit ac6f642372dd30acaf723447f202f78344255d0c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Mar 26 15:17:12 2021 +0100

    origin_visit_status_add: Fix inconsistent/incorrect errors when type is None and visit is missing.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1237/ for more details.

looks indeed reasonable (both the 1. point and the code) thanks

This revision is now accepted and ready to land.Mar 29 2021, 11:33 AM