Page MenuHomeSoftware Heritage

Deduplicate parsing/unparsing tests of the new SWHID classes
ClosedPublic

Authored by vlorentz on Feb 19 2021, 4:38 PM.

Details

Summary

They were all very similar and only differ in what 'edge' cases they accept

Depends on D5120

Diff Detail

Repository
rDMOD Data model
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D5121 (id=18312)

Could not rebase; Attempt merge onto 758eb885d3...

Updating 758eb88..b55a60a
Fast-forward
 swh/model/identifiers.py            | 410 ++++++++++++++++++++++++++++++++++--
 swh/model/tests/test_identifiers.py | 391 ++++++++++++++++++++++++++++++++--
 tox.ini                             |   1 +
 3 files changed, 763 insertions(+), 39 deletions(-)
Changes applied before test
commit b55a60a830be4ab0b0fbc72d88dac978ce124739
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Feb 19 16:37:44 2021 +0100

    Deduplicate parsing/unparsing tests of the new SWHID classes
    
    They were all very similar and only differ in what 'edge' cases they accept

commit bd84ec2860e2c8a16f804259c3fc9daa9cc6bd09
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Feb 19 16:06:47 2021 +0100

    Deduplicate code between CoreSWHID, QualifiedSWHID, and ExtendedSWHID
    
    by making them all derive from an abstract class.

commit 36bfa7835736a9e3352657435e83234ef0ed6387
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Feb 19 14:18:45 2021 +0100

    Add new class ExtendedSWHID as an alternative to SWHID/QualifiedSWHID
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * CoreSWHID, for "core SWHID" only (standard types + no qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    
    This commit adds the last one.
    
    It also removes "ori" as a valid object type for CoreSWHID and
    QualifiedSWHID, as it now only belongs in ExtendedSWHID.

commit 992376597d6f44dfe78c4eef40bf651fe355f0fd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Feb 19 13:58:03 2021 +0100

    Use dict instead of temporary SWHID when parsing {Core,Qualified}SWHID.
    
    It is cleaner, avoids warnings, and will be needed when introducing
    ExtendedSWHID in a future commit.

commit 8e917597dbb22054ada06387a0501452d8862d34
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Feb 19 10:56:29 2021 +0100

    QualifiedSWHID: Replace the 'qualifiers' dict with statically defined attributes
    
    And store their parsed values (CoreSWHID, tuple of ints, etc.) instead of string.

commit eba8d84de660e2b3d7df304fa32da7404fb9f6bb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Feb 18 13:09:21 2021 +0100

    Add new class CoreSWHID as an alternative to SWHID/QualifiedSWHID
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * CoreSWHID, for "core SWHID" only (standard types + no qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    
    This commit adds the second one

commit 690b7f824f55bc65fbec21b29714926476038abb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Feb 16 13:33:13 2021 +0100

    Add new class QualifiedSWHID to replace SWHID, and deprecate the latter.
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * CoreSWHID, for "core SWHID" only (standard types + no qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    
    Since migrating from SWHID will break existing code, this commit uses
    the opportunity to modernize it a little, ie.:
    
    * `keyword`-only constructor, to get rid of the hacky default values for
      `object_type` and `object_id`
    * enum instead of strings for the object type
    * `bytes` instead of an hex string for the object id
    * rename `metadata` to `qualifiers`

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/253/ for more details.

olasd added a subscriber: olasd.

Very nice.

I'm not sure about the itertools.product usage. I'd consider splitting the test in three (one for each class), or at least I'd swap the arguments so the test runs are grouped by SWHID class instead of by SWHID value.

I guess my comment on checking which parse error is triggering applies to this diff instead of D5118 then.

Quoting it:

I see you've extended the CoreSWHID parser test to check that it rejects the extended object types. Would it make sense to extend this test (and the ExtendedSWHID counterpart) to check *which* ValidationError triggered (in another diff)? It's probably just a matter of adding a substring of the ValidationError to the pytest.mark.parametrize args. (Maybe it's overkill? I don't know.)

swh/model/tests/test_identifiers.py
1346

Could you consider making the VALID_SWHIDS list a list of pytest.params with an id corresponding to the swhid argument? The automatic test id generation is a bit meh: swh/model/tests/test_identifiers.py::test_parse_unparse_swhids[swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2-core0-qualified0-extended0]

This revision is now accepted and ready to land.Feb 23 2021, 11:41 AM

Build is green

Patch application report for D5121 (id=18343)

Could not rebase; Attempt merge onto 758eb885d3...

Updating 758eb88..172eadb
Fast-forward
 swh/model/identifiers.py            | 410 ++++++++++++++++++++++++++++++++++--
 swh/model/tests/test_identifiers.py | 403 +++++++++++++++++++++++++++++++++--
 tox.ini                             |   1 +
 3 files changed, 775 insertions(+), 39 deletions(-)
Changes applied before test
commit 172eadb86cae99ccd65a98f4953ad5b422d8be27
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Feb 19 16:37:44 2021 +0100

    Deduplicate parsing/unparsing tests of the new SWHID classes
    
    They were all very similar and only differ in what 'edge' cases they accept

commit 9bcc88469c1dd3080014dc4d05177724c2c9332d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Feb 19 16:06:47 2021 +0100

    Deduplicate code between CoreSWHID, QualifiedSWHID, and ExtendedSWHID
    
    by making them all derive from an abstract class.

commit d4b20dcdc2795d544461ea469e4156968f9e2cda
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Feb 19 14:18:45 2021 +0100

    Add new class ExtendedSWHID as an alternative to SWHID/QualifiedSWHID
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * CoreSWHID, for "core SWHID" only (standard types + no qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    
    This commit adds the last one.
    
    It also removes "ori" as a valid object type for CoreSWHID and
    QualifiedSWHID, as it now only belongs in ExtendedSWHID.

commit 992376597d6f44dfe78c4eef40bf651fe355f0fd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Feb 19 13:58:03 2021 +0100

    Use dict instead of temporary SWHID when parsing {Core,Qualified}SWHID.
    
    It is cleaner, avoids warnings, and will be needed when introducing
    ExtendedSWHID in a future commit.

commit 8e917597dbb22054ada06387a0501452d8862d34
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Feb 19 10:56:29 2021 +0100

    QualifiedSWHID: Replace the 'qualifiers' dict with statically defined attributes
    
    And store their parsed values (CoreSWHID, tuple of ints, etc.) instead of string.

commit eba8d84de660e2b3d7df304fa32da7404fb9f6bb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Feb 18 13:09:21 2021 +0100

    Add new class CoreSWHID as an alternative to SWHID/QualifiedSWHID
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * CoreSWHID, for "core SWHID" only (standard types + no qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    
    This commit adds the second one

commit 690b7f824f55bc65fbec21b29714926476038abb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Feb 16 13:33:13 2021 +0100

    Add new class QualifiedSWHID to replace SWHID, and deprecate the latter.
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * CoreSWHID, for "core SWHID" only (standard types + no qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    
    Since migrating from SWHID will break existing code, this commit uses
    the opportunity to modernize it a little, ie.:
    
    * `keyword`-only constructor, to get rid of the hacky default values for
      `object_type` and `object_id`
    * enum instead of strings for the object type
    * `bytes` instead of an hex string for the object id
    * rename `metadata` to `qualifiers`

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/261/ for more details.

In D5121#129696, @olasd wrote:

I'm not sure about the itertools.product usage. I'd consider splitting the test in three (one for each class), or at least I'd swap the arguments so the test runs are grouped by SWHID class instead of by SWHID value.

Why? If a string should be rejected by all three classes, it makes sense to show the three errors next to each other.

swh/model/tests/test_identifiers.py
1346

Sure.

I couldn't do it exactly that way because the id can't be passed to the test, but done.