Page MenuHomeSoftware Heritage

Add new class QualifiedSWHID to replace SWHID, and deprecate the latter.
ClosedPublic

Authored by vlorentz on Feb 16 2021, 1:34 PM.

Details

Summary

Following the discussion on T3034, we decided to replace SWHID with
two or three classes:

  • QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
  • CoreSWHID, for "core SWHID" only (standard types + no qualifiers)
  • ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)

Since migrating from SWHID will break existing code, this commit uses
the opportunity to modernize it a little, ie.:

  • keyword-only constructor, to get rid of the hacky default values for object_type and object_id
  • enum instead of strings for the object type
  • bytes instead of an hex string for the object id
  • rename metadata to qualifiers

Diff Detail

Repository
rDMOD Data model
Branch
qualified-swhid
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 19377
Build 30049: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 30048: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Build is green

Patch application report for D5081 (id=18141)

Rebasing onto 0c16581283...

Current branch diff-target is up to date.
Changes applied before test
commit 6af9d198ad3a4a44681b5f66d801a4ef53b3a1f0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Feb 16 13:33:13 2021 +0100

    Add new class QualifiedSWHID to replace SWHID, and deprecate the latter.
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    * maybe CoreSWHID, for "core SWHID" only (standard types + qualifiers)

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/223/ for more details.

vlorentz edited the summary of this revision. (Show Details)

Build is green

Patch application report for D5081 (id=18142)

Rebasing onto 0c16581283...

Current branch diff-target is up to date.
Changes applied before test
commit 166a856916aefd99ace8ebb934b821e05a0f6b43
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Feb 16 13:33:13 2021 +0100

    Add new class QualifiedSWHID to replace SWHID, and deprecate the latter.
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    * maybe CoreSWHID, for "core SWHID" only (standard types + qualifiers)
    
    Since migrating from SWHID will break existing code, this commit uses
    the opportunity to modernize it a little, ie.:
    
    * `keyword`-only constructor, to get rid of the hacky default values for
      `object_type` and `object_id`
    * enum instead of strings for the object type
    * `bytes` instead of an hex string for the object id
    * rename `metadata` to `qualifiers`

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/224/ for more details.

zack requested changes to this revision.Feb 16 2021, 2:26 PM
zack added a subscriber: zack.
zack added inline comments.
swh/model/identifiers.py
25

Thanks gawd yes.

  • minor (1): can you add a docstring for this enum, even a oneliner would do
  • minor (2): I think it's more customary to from enum import Enum
740–751

can you separate attributes with empty lines here?
without them it is hard to visually associate docstrings to the right field

753–754

I've a grudge with a series of separate validators here, considering most of the syntax (except qualifiers) can be easily validated with a regex, which has also been shown in the past to be faster (cf.: 574685052348bfc6ed28570f06b9cc4302dfde27). Can we use a single regex and validator instead (for core unqualified SWHIDs)?

771

leftover print?

798–800

This is quadratic on the number of qualifiers.
Can we use something-str.join-something instead?

830

should we log a deprecation message here? or is it too soon to do that?

This revision now requires changes to proceed.Feb 16 2021, 2:26 PM
swh/model/identifiers.py
753–754

No, because using a single regex is only possible when parsing a string. Validators are also used when building the object from parts.

798–800

I copy-pasted from the existing code, so it's out of scope for this diff.

830

oops, it got lost before I committed.

vlorentz edited the summary of this revision. (Show Details)

apply all of @zack's comments

Build has FAILED

Patch application report for D5081 (id=18144)

Rebasing onto 0c16581283...

Current branch diff-target is up to date.
Changes applied before test
commit 62bcc5fb93c424b2b013347595e958fb93b5ada5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Feb 16 13:33:13 2021 +0100

    Add new class QualifiedSWHID to replace SWHID, and deprecate the latter.
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    * maybe CoreSWHID, for "core SWHID" only (standard types + qualifiers)
    
    Since migrating from SWHID will break existing code, this commit uses
    the opportunity to modernize it a little, ie.:
    
    * `keyword`-only constructor, to get rid of the hacky default values for
      `object_type` and `object_id`
    * enum instead of strings for the object type
    * `bytes` instead of an hex string for the object id
    * rename `metadata` to `qualifiers`

Link to build: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/225/
See console output for more information: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/225/console

Build is green

Patch application report for D5081 (id=18145)

Rebasing onto 0c16581283...

Current branch diff-target is up to date.
Changes applied before test
commit 7505edc8e026713fafec933475d300b097fdbf5a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Feb 16 13:33:13 2021 +0100

    Add new class QualifiedSWHID to replace SWHID, and deprecate the latter.
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    * maybe CoreSWHID, for "core SWHID" only (standard types + qualifiers)
    
    Since migrating from SWHID will break existing code, this commit uses
    the opportunity to modernize it a little, ie.:
    
    * `keyword`-only constructor, to get rid of the hacky default values for
      `object_type` and `object_id`
    * enum instead of strings for the object type
    * `bytes` instead of an hex string for the object id
    * rename `metadata` to `qualifiers`

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/226/ for more details.

zack resigned from this revision.EditedFeb 16 2021, 3:07 PM

LGTM.

I'm not approving it (and only resigning as reviewer instead) just to allow others to have a look, given it's something quite foundational in the data model.

Did not read the tests yet.

swh/model/identifiers.py
784

Missing scenarios it seems.

809

from where does _object_type_map come from?

(and missing test)

(as mentioned on IRC): Since we are doing big works, I see no reason to keep the 'qualifiers' a schema-less dict-like attribute. I'd rather have them as attributes.

swh/model/identifiers.py
809

it's one of the many constants at the toplevel, I'll clean it up when removing SWHID

Build is green

Patch application report for D5081 (id=18232)

Rebasing onto 0c16581283...

Current branch diff-target is up to date.
Changes applied before test
commit dbd8b903f1789e7c15445966e9577fc5f3d3a9bb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Feb 16 13:33:13 2021 +0100

    Add new class QualifiedSWHID to replace SWHID, and deprecate the latter.
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    * maybe CoreSWHID, for "core SWHID" only (standard types + qualifiers)
    
    Since migrating from SWHID will break existing code, this commit uses
    the opportunity to modernize it a little, ie.:
    
    * `keyword`-only constructor, to get rid of the hacky default values for
      `object_type` and `object_id`
    * enum instead of strings for the object type
    * `bytes` instead of an hex string for the object id
    * rename `metadata` to `qualifiers`

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/227/ for more details.

add serialization tests (for both SWHID and QualifiedSWHID)

Build is green

Patch application report for D5081 (id=18234)

Rebasing onto 0c16581283...

Current branch diff-target is up to date.
Changes applied before test
commit ad5c39e895c83e38c04e65748b6758fb34478375
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Feb 16 13:33:13 2021 +0100

    Add new class QualifiedSWHID to replace SWHID, and deprecate the latter.
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    * maybe CoreSWHID, for "core SWHID" only (standard types + qualifiers)
    
    Since migrating from SWHID will break existing code, this commit uses
    the opportunity to modernize it a little, ie.:
    
    * `keyword`-only constructor, to get rid of the hacky default values for
      `object_type` and `object_id`
    * enum instead of strings for the object type
    * `bytes` instead of an hex string for the object id
    * rename `metadata` to `qualifiers`

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/228/ for more details.

Build is green

Patch application report for D5081 (id=18235)

Rebasing onto 0c16581283...

Current branch diff-target is up to date.
Changes applied before test
commit e8c334c8da5fc342e1a902c454c6b00782bd5eb6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Feb 16 13:33:13 2021 +0100

    Add new class QualifiedSWHID to replace SWHID, and deprecate the latter.
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    * maybe CoreSWHID, for "core SWHID" only (standard types + qualifiers)
    
    Since migrating from SWHID will break existing code, this commit uses
    the opportunity to modernize it a little, ie.:
    
    * `keyword`-only constructor, to get rid of the hacky default values for
      `object_type` and `object_id`
    * enum instead of strings for the object type
    * `bytes` instead of an hex string for the object id
    * rename `metadata` to `qualifiers`

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/229/ for more details.

(as mentioned on IRC): Since we are doing big works, I see no reason to keep the 'qualifiers' a schema-less dict-like attribute. I'd rather have them as attributes.

I'll do this in a future diff

Build is green

Patch application report for D5081 (id=18236)

Rebasing onto 0c16581283...

Current branch diff-target is up to date.
Changes applied before test
commit a69394b8c174bc4a80bbacf5c933ac19b1a3ebf3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Feb 16 13:33:13 2021 +0100

    Add new class QualifiedSWHID to replace SWHID, and deprecate the latter.
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    * maybe CoreSWHID, for "core SWHID" only (standard types + qualifiers)
    
    Since migrating from SWHID will break existing code, this commit uses
    the opportunity to modernize it a little, ie.:
    
    * `keyword`-only constructor, to get rid of the hacky default values for
      `object_type` and `object_id`
    * enum instead of strings for the object type
    * `bytes` instead of an hex string for the object id
    * rename `metadata` to `qualifiers`

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/230/ for more details.

Build is green

Patch application report for D5081 (id=18237)

Rebasing onto 0c16581283...

Current branch diff-target is up to date.
Changes applied before test
commit bc91ea91b3c38b96fc461097b0d55e889bcf893a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Feb 16 13:33:13 2021 +0100

    Add new class QualifiedSWHID to replace SWHID, and deprecate the latter.
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * CoreSWHID, for "core SWHID" only (standard types + no qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    
    Since migrating from SWHID will break existing code, this commit uses
    the opportunity to modernize it a little, ie.:
    
    * `keyword`-only constructor, to get rid of the hacky default values for
      `object_type` and `object_id`
    * enum instead of strings for the object type
    * `bytes` instead of an hex string for the object id
    * rename `metadata` to `qualifiers`

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/231/ for more details.

improve QualifiedSWHID's docstring:

  • use doctest
  • fix outdated syntax
  • show example of from_string

Build has FAILED

Patch application report for D5081 (id=18239)

Rebasing onto 0c16581283...

Current branch diff-target is up to date.
Changes applied before test
commit d5f996bfbb3885a4612978bce7d03ad0e49df393
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Feb 16 13:33:13 2021 +0100

    Add new class QualifiedSWHID to replace SWHID, and deprecate the latter.
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * CoreSWHID, for "core SWHID" only (standard types + no qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    
    Since migrating from SWHID will break existing code, this commit uses
    the opportunity to modernize it a little, ie.:
    
    * `keyword`-only constructor, to get rid of the hacky default values for
      `object_type` and `object_id`
    * enum instead of strings for the object type
    * `bytes` instead of an hex string for the object id
    * rename `metadata` to `qualifiers`

Link to build: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/233/
See console output for more information: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/233/console

Build has FAILED

Patch application report for D5081 (id=18239)

Rebasing onto 0c16581283...

Current branch diff-target is up to date.
Changes applied before test
commit d5f996bfbb3885a4612978bce7d03ad0e49df393
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Feb 16 13:33:13 2021 +0100

    Add new class QualifiedSWHID to replace SWHID, and deprecate the latter.
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * CoreSWHID, for "core SWHID" only (standard types + no qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    
    Since migrating from SWHID will break existing code, this commit uses
    the opportunity to modernize it a little, ie.:
    
    * `keyword`-only constructor, to get rid of the hacky default values for
      `object_type` and `object_id`
    * enum instead of strings for the object type
    * `bytes` instead of an hex string for the object id
    * rename `metadata` to `qualifiers`

Link to build: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/234/
See console output for more information: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/234/console

Build is green

Patch application report for D5081 (id=18240)

Rebasing onto 0c16581283...

Current branch diff-target is up to date.
Changes applied before test
commit 335a91e4697999fbe63180a530ee7d031f994b9f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Feb 16 13:33:13 2021 +0100

    Add new class QualifiedSWHID to replace SWHID, and deprecate the latter.
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * CoreSWHID, for "core SWHID" only (standard types + no qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    
    Since migrating from SWHID will break existing code, this commit uses
    the opportunity to modernize it a little, ie.:
    
    * `keyword`-only constructor, to get rid of the hacky default values for
      `object_type` and `object_id`
    * enum instead of strings for the object type
    * `bytes` instead of an hex string for the object id
    * rename `metadata` to `qualifiers`

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/235/ for more details.

Build is green

Patch application report for D5081 (id=18241)

Rebasing onto 0c16581283...

Current branch diff-target is up to date.
Changes applied before test
commit c7161592c48c6853c99a453ac079170fed7ee451
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Feb 16 13:33:13 2021 +0100

    Add new class QualifiedSWHID to replace SWHID, and deprecate the latter.
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * CoreSWHID, for "core SWHID" only (standard types + no qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    
    Since migrating from SWHID will break existing code, this commit uses
    the opportunity to modernize it a little, ie.:
    
    * `keyword`-only constructor, to get rid of the hacky default values for
      `object_type` and `object_id`
    * enum instead of strings for the object type
    * `bytes` instead of an hex string for the object id
    * rename `metadata` to `qualifiers`

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/236/ for more details.

Build is green

Patch application report for D5081 (id=18291)

Rebasing onto 758eb885d3...

Current branch diff-target is up to date.
Changes applied before test
commit 50a4eaff5b29439d900ba128d93f0cb075ecc60c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Feb 16 13:33:13 2021 +0100

    Add new class QualifiedSWHID to replace SWHID, and deprecate the latter.
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * CoreSWHID, for "core SWHID" only (standard types + no qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    
    Since migrating from SWHID will break existing code, this commit uses
    the opportunity to modernize it a little, ie.:
    
    * `keyword`-only constructor, to get rid of the hacky default values for
      `object_type` and `object_id`
    * enum instead of strings for the object type
    * `bytes` instead of an hex string for the object id
    * rename `metadata` to `qualifiers`

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/241/ for more details.

Build is green

Patch application report for D5081 (id=18300)

Rebasing onto 758eb885d3...

Current branch diff-target is up to date.
Changes applied before test
commit 690b7f824f55bc65fbec21b29714926476038abb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Feb 16 13:33:13 2021 +0100

    Add new class QualifiedSWHID to replace SWHID, and deprecate the latter.
    
    Following the discussion on T3034, we decided to replace SWHID with
    two or three classes:
    
    * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers)
    * CoreSWHID, for "core SWHID" only (standard types + no qualifiers)
    * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers)
    
    Since migrating from SWHID will break existing code, this commit uses
    the opportunity to modernize it a little, ie.:
    
    * `keyword`-only constructor, to get rid of the hacky default values for
      `object_type` and `object_id`
    * enum instead of strings for the object type
    * `bytes` instead of an hex string for the object id
    * rename `metadata` to `qualifiers`

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/244/ for more details.

ardumont added inline comments.
swh/model/tests/test_identifiers.py
1203

what's the _x ? (it's not immediatly apparent to me).

This revision is now accepted and ready to land.Feb 22 2021, 3:20 PM

Since migrating from SWHID will break existing code

Do you also plan on migrating the code which will break following that release?

Since migrating from SWHID will break existing code

Do you also plan on migrating the code which will break following that release?

yes

swh/model/tests/test_identifiers.py
1203

hash_to_bytes. it's used everywhere in this module so it makes sense to have a short alias

swh/model/tests/test_identifiers.py
1203

lol, i noticed it after writing the comment and removed the comment (or so i thought)...
sorry for the noise.