Page MenuHomeSoftware Heritage

extid: remove unicity on (extid_type, extid) and (target_type, target)
ClosedPublic

Authored by vlorentz on Mar 26 2021, 4:04 PM.

Details

Summary

It did not make sense for multiple reasons:

  1. two extids can point to the same target (eg. extids with type git and git-sha256; or two package managers with different checksums)
  2. inserting two objects with the same target or extid in a single call actually wrote both, but would crash when reading
  3. inserting extid1 then extid2 would write both to Kafka, but only extid1 would be inserted. When replaying on a new DB, extid2 may be inserted and extid1 ignored

Points 2 and 3 are simply fixable bugs, but 1 is an issue by design,
and this commit fixes all of them at once.

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D5363 (id=19212)

Could not rebase; Attempt merge onto eff2383717...

Updating eff23837..c006c201
Fast-forward
 sql/upgrades/173.sql               | 14 ++++++++
 swh/storage/cassandra/storage.py   | 74 ++++++++++++++++----------------------
 swh/storage/interface.py           | 10 +++---
 swh/storage/postgresql/db.py       |  2 +-
 swh/storage/postgresql/storage.py  | 24 ++++++-------
 swh/storage/sql/30-schema.sql      |  2 +-
 swh/storage/sql/60-indexes.sql     |  6 ++--
 swh/storage/tests/storage_tests.py | 72 +++++++++++++++++++++++++------------
 8 files changed, 114 insertions(+), 90 deletions(-)
 create mode 100644 sql/upgrades/173.sql
Changes applied before test
commit c006c201526e7d26b17853d8334c5bdce04ad614
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Mar 26 16:03:27 2021 +0100

    extid: remove unicity on (extid_type, extid) and (target_type, target)
    
    It did not make sense for multiple reasons:
    
    1. two extids can point to the same target (eg. extids with type git and git-sha256;
       or two package managers with different checksums)
    2. inserting two objects with the same target or extid in a single call actually
       wrote both, but would crash when reading
    3. inserting extid1 then extid2 would write both to Kafka, but only extid1
       would be inserted. When replaying on a new DB, extid2 may be inserted
       and extid1 ignored
    
    Points 2 and 3 are simply fixable bugs, but 1 is an issue by design,
    and this commit fixes all of them at once.

commit ac6f642372dd30acaf723447f202f78344255d0c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Mar 26 15:17:12 2021 +0100

    origin_visit_status_add: Fix inconsistent/incorrect errors when type is None and visit is missing.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1236/ for more details.

Build is green

Patch application report for D5363 (id=19213)

Could not rebase; Attempt merge onto eff2383717...

Updating eff23837..cfb2417f
Fast-forward
 sql/upgrades/173.sql               | 14 ++++++++
 swh/storage/cassandra/storage.py   | 74 ++++++++++++++++----------------------
 swh/storage/interface.py           | 10 +++---
 swh/storage/postgresql/db.py       |  2 +-
 swh/storage/postgresql/storage.py  | 24 ++++++-------
 swh/storage/sql/30-schema.sql      |  2 +-
 swh/storage/sql/60-indexes.sql     |  6 ++--
 swh/storage/tests/storage_tests.py | 66 +++++++++++++++++++++++-----------
 8 files changed, 111 insertions(+), 87 deletions(-)
 create mode 100644 sql/upgrades/173.sql
Changes applied before test
commit cfb2417fecfff890ec19fb60405df0c709b89fa6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Mar 26 16:03:27 2021 +0100

    extid: remove unicity on (extid_type, extid) and (target_type, target)
    
    It did not make sense for multiple reasons:
    
    1. two extids can point to the same target (eg. extids with type git and git-sha256;
       or two package managers with different checksums)
    2. inserting two objects with the same target or extid in a single call actually
       wrote both, but would crash when reading
    3. inserting extid1 then extid2 would write both to Kafka, but only extid1
       would be inserted. When replaying on a new DB, extid2 may be inserted
       and extid1 ignored
    
    Points 2 and 3 are simply fixable bugs, but 1 is an issue by design,
    and this commit fixes all of them at once.

commit ac6f642372dd30acaf723447f202f78344255d0c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Mar 26 15:17:12 2021 +0100

    origin_visit_status_add: Fix inconsistent/incorrect errors when type is None and visit is missing.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1237/ for more details.

looks indeed reasonable (both the 1. point and the code) thanks

This revision is now accepted and ready to land.Mar 29 2021, 11:33 AM