Page MenuHomeSoftware Heritage

Add new class ExtendedSWHID as an alternative to SWHID/QualifiedSWHID
ClosedPublic

Authored by vlorentz on Feb 19 2021, 2:21 PM.

Details

Summary

Following the discussion on T3034, we decided to replace SWHID with
two or three classes:

  • QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
  • CoreSWHID, for "core SWHID" only (standard types + no qualifiers)
  • ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)

This commit adds the last one.

It also removes "ori" as a valid object type for CoreSWHID and
QualifiedSWHID, as it now only belongs in ExtendedSWHID.

Depends on D5117.

Diff Detail

Repository
rDMOD Data model
Branch
extended-swhid
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 19391
Build 30073: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 30072: arc lint + arc unit

Event Timeline

Build has FAILED

Patch application report for D5118 (id=18305)

Could not rebase; Attempt merge onto 758eb885d3...

Updating 758eb88..76d0ef7
Fast-forward
 swh/model/identifiers.py            | 457 ++++++++++++++++++++++++++++++--
 swh/model/tests/test_identifiers.py | 508 ++++++++++++++++++++++++++++++++++--
 tox.ini                             |   1 +
 3 files changed, 928 insertions(+), 38 deletions(-)
Changes applied before test
commit 76d0ef7437ef277344f3ff897d49fed36bf01e54
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Feb 19 14:18:45 2021 +0100

    Add new class ExtendedSWHID as an alternative to SWHID/QualifiedSWHID
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * CoreSWHID, for "core SWHID" only (standard types + no qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    
    This commit adds the last one.
    
    It also removes "ori" as a valid object type for CoreSWHID and
    QualifiedSWHID, as it now only belongs in ExtendedSWHID.

commit 3668e2e928d0452af9116c5b56da73afd5720737
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Feb 19 13:58:03 2021 +0100

    Use dict instead of temporary SWHID when parsing {Core,Qualified}SWHID.
    
    It is cleaner, avoids warnings, and will be needed when introducing
    ExtendedSWHID in a future commit.

commit 8e917597dbb22054ada06387a0501452d8862d34
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Feb 19 10:56:29 2021 +0100

    QualifiedSWHID: Replace the 'qualifiers' dict with statically defined attributes
    
    And store their parsed values (CoreSWHID, tuple of ints, etc.) instead of string.

commit eba8d84de660e2b3d7df304fa32da7404fb9f6bb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Feb 18 13:09:21 2021 +0100

    Add new class CoreSWHID as an alternative to SWHID/QualifiedSWHID
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * CoreSWHID, for "core SWHID" only (standard types + no qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    
    This commit adds the second one

commit 690b7f824f55bc65fbec21b29714926476038abb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Feb 16 13:33:13 2021 +0100

    Add new class QualifiedSWHID to replace SWHID, and deprecate the latter.
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * CoreSWHID, for "core SWHID" only (standard types + no qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    
    Since migrating from SWHID will break existing code, this commit uses
    the opportunity to modernize it a little, ie.:
    
    * `keyword`-only constructor, to get rid of the hacky default values for
      `object_type` and `object_id`
    * enum instead of strings for the object type
    * `bytes` instead of an hex string for the object id
    * rename `metadata` to `qualifiers`

Link to build: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/249/
See console output for more information: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/249/console

Harbormaster returned this revision to the author for changes because remote builds failed.Feb 19 2021, 2:23 PM
Harbormaster failed remote builds in B19391: Diff 18305!

Build is green

Patch application report for D5118 (id=18309)

Could not rebase; Attempt merge onto 758eb885d3...

Updating 758eb88..36bfa78
Fast-forward
 swh/model/identifiers.py            | 457 ++++++++++++++++++++++++++++++--
 swh/model/tests/test_identifiers.py | 508 ++++++++++++++++++++++++++++++++++--
 tox.ini                             |   1 +
 3 files changed, 928 insertions(+), 38 deletions(-)
Changes applied before test
commit 36bfa7835736a9e3352657435e83234ef0ed6387
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Feb 19 14:18:45 2021 +0100

    Add new class ExtendedSWHID as an alternative to SWHID/QualifiedSWHID
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * CoreSWHID, for "core SWHID" only (standard types + no qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    
    This commit adds the last one.
    
    It also removes "ori" as a valid object type for CoreSWHID and
    QualifiedSWHID, as it now only belongs in ExtendedSWHID.

commit 992376597d6f44dfe78c4eef40bf651fe355f0fd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Feb 19 13:58:03 2021 +0100

    Use dict instead of temporary SWHID when parsing {Core,Qualified}SWHID.
    
    It is cleaner, avoids warnings, and will be needed when introducing
    ExtendedSWHID in a future commit.

commit 8e917597dbb22054ada06387a0501452d8862d34
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Feb 19 10:56:29 2021 +0100

    QualifiedSWHID: Replace the 'qualifiers' dict with statically defined attributes
    
    And store their parsed values (CoreSWHID, tuple of ints, etc.) instead of string.

commit eba8d84de660e2b3d7df304fa32da7404fb9f6bb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Feb 18 13:09:21 2021 +0100

    Add new class CoreSWHID as an alternative to SWHID/QualifiedSWHID
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * CoreSWHID, for "core SWHID" only (standard types + no qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    
    This commit adds the second one

commit 690b7f824f55bc65fbec21b29714926476038abb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Feb 16 13:33:13 2021 +0100

    Add new class QualifiedSWHID to replace SWHID, and deprecate the latter.
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * CoreSWHID, for "core SWHID" only (standard types + no qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    
    Since migrating from SWHID will break existing code, this commit uses
    the opportunity to modernize it a little, ie.:
    
    * `keyword`-only constructor, to get rid of the hacky default values for
      `object_type` and `object_id`
    * enum instead of strings for the object type
    * `bytes` instead of an hex string for the object id
    * rename `metadata` to `qualifiers`

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/250/ for more details.

zack added inline comments.
swh/model/identifiers.py
48

"RAW_EXTRINSIC_METADATA" is a horribly long mouthful.
(But I don't have better suggestions at the moment...)

swh/model/identifiers.py
48

I agree, but we don't have a better name; and that one is already used in various places

olasd added a subscriber: olasd.

This looks fine.

Please consider adding a test that ExtendedObjectType is a superset of ObjectType?

I see you've extended the CoreSWHID parser test to check that it rejects the extended object types. Would it make sense to extend this test (and the ExtendedSWHID counterpart) to check *which* ValidationError triggered (in another diff)? It's probably just a matter of adding a substring of the ValidationError to the pytest.mark.parametrize args. (Maybe it's overkill? I don't know.)

swh/model/identifiers.py
64–66

Could we use the enum values instead of duplicating these vars?

717–735

These two dicts look like fairly easy candidates for deletion (in a further diff)?

swh/model/tests/test_identifiers.py
1513

This could probably be a pytest.mark.parametrized test instead of a for loop.

This revision is now accepted and ready to land.Feb 23 2021, 11:16 AM
swh/model/tests/test_identifiers.py
1513

Ah, I see you're doing that in D5121. Carry on.

In D5118#129656, @olasd wrote:

This looks fine.

Please consider adding a test that ExtendedObjectType is a superset of ObjectType?

Sure, done.

I see you've extended the CoreSWHID parser test to check that it rejects the extended object types. Would it make sense to extend this test (and the ExtendedSWHID counterpart) to check *which* ValidationError triggered (in another diff)? It's probably just a matter of adding a substring of the ValidationError to the pytest.mark.parametrize args. (Maybe it's overkill? I don't know.)

No, @zack doesn't want us to test error strings. We could add exception subclasses instead, but I don't see the point

Build is green

Patch application report for D5118 (id=18341)

Could not rebase; Attempt merge onto 758eb885d3...

Updating 758eb88..d4b20dc
Fast-forward
 swh/model/identifiers.py            | 457 ++++++++++++++++++++++++++++++--
 swh/model/tests/test_identifiers.py | 514 ++++++++++++++++++++++++++++++++++--
 tox.ini                             |   1 +
 3 files changed, 934 insertions(+), 38 deletions(-)
Changes applied before test
commit d4b20dcdc2795d544461ea469e4156968f9e2cda
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Feb 19 14:18:45 2021 +0100

    Add new class ExtendedSWHID as an alternative to SWHID/QualifiedSWHID
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * CoreSWHID, for "core SWHID" only (standard types + no qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    
    This commit adds the last one.
    
    It also removes "ori" as a valid object type for CoreSWHID and
    QualifiedSWHID, as it now only belongs in ExtendedSWHID.

commit 992376597d6f44dfe78c4eef40bf651fe355f0fd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Feb 19 13:58:03 2021 +0100

    Use dict instead of temporary SWHID when parsing {Core,Qualified}SWHID.
    
    It is cleaner, avoids warnings, and will be needed when introducing
    ExtendedSWHID in a future commit.

commit 8e917597dbb22054ada06387a0501452d8862d34
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Feb 19 10:56:29 2021 +0100

    QualifiedSWHID: Replace the 'qualifiers' dict with statically defined attributes
    
    And store their parsed values (CoreSWHID, tuple of ints, etc.) instead of string.

commit eba8d84de660e2b3d7df304fa32da7404fb9f6bb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Feb 18 13:09:21 2021 +0100

    Add new class CoreSWHID as an alternative to SWHID/QualifiedSWHID
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * CoreSWHID, for "core SWHID" only (standard types + no qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    
    This commit adds the second one

commit 690b7f824f55bc65fbec21b29714926476038abb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Feb 16 13:33:13 2021 +0100

    Add new class QualifiedSWHID to replace SWHID, and deprecate the latter.
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * CoreSWHID, for "core SWHID" only (standard types + no qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    
    Since migrating from SWHID will break existing code, this commit uses
    the opportunity to modernize it a little, ie.:
    
    * `keyword`-only constructor, to get rid of the hacky default values for
      `object_type` and `object_id`
    * enum instead of strings for the object type
    * `bytes` instead of an hex string for the object id
    * rename `metadata` to `qualifiers`

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/259/ for more details.