Page MenuHomeSoftware Heritage

model: Remove override of RawExtrinsicMetadata.unique_key(), so it now returns the hash.
AcceptedPublic

Authored by vlorentz on Mon, Feb 8, 3:40 PM.

Details

Reviewers
olasd
Group Reviewers
Reviewers

Diff Detail

Event Timeline

Build is green

Patch application report for D5038 (id=17974)

Could not rebase; Attempt merge onto 0c16581283...

Updating 0c16581..1f168e4
Fast-forward
 swh/model/hashutil.py               |   9 +-
 swh/model/identifiers.py            |  94 ++++++++++++++
 swh/model/model.py                  |  19 ++-
 swh/model/tests/test_identifiers.py | 245 ++++++++++++++++++++++++++++++++++++
 swh/model/tests/test_model.py       |  11 +-
 5 files changed, 364 insertions(+), 14 deletions(-)
Changes applied before test
commit 1f168e42de1a3a54ff0e2d2c8f467a18b4ac8dbb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Feb 8 15:40:21 2021 +0100

    model: Remove override of RawExtrinsicMetadata.unique_key(), so it now returns the hash.

commit e3383172e94518217be3905ed5e3f7acd0dd80fd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Feb 4 12:24:46 2021 +0100

    identifiers: Properly define the behavior of raw_extrinsic_metadata on negative timestamps.

commit 03d282b9af891b6d48e83bd5301e57c0472d5b03
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Feb 4 11:02:39 2021 +0100

    identifiers: Change the manifest format of raw_extrinsic_metadata to use integer instead of ISO8601
    
    Serializing as ISO8601 makes the hash brittle, because the database may
    change the timezone silently and/or lose precision in the microseconds.
    
    As we do not need precise timestamp, using an integer is good enough,
    and is consistant with the git format.
    
    The manifest also does not need to contain a timezone, as it only
    represents the timezone of the system that fetched this metadata,
    which is useless data.

commit 266b88dcaaa0cab48c67e62ebca51f0a4599c435
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jan 29 15:08:49 2021 +0100

    model: Add 'id' field to RawExtrinsicMetadata
    
    So that they can be properly deduplicated and referenced.

commit 272468f3b5a96c8854a26efe333c32cba4504aff
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Jan 25 12:31:12 2021 +0100

    identifiers: Add raw_extrinsic_metadata_identifier
    
    This will be used to compute an intrisic identifier for RawExtrinsicMetadata;
    which can be used for deduplication and refering to it like any other sha1_git
    instead of needed to use a tuple of its fields.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/219/ for more details.

This revision is now accepted and ready to land.Mon, Feb 8, 3:58 PM

Build is green

Patch application report for D5038 (id=17978)

Could not rebase; Attempt merge onto 0c16581283...

Updating 0c16581..ff9011f
Fast-forward
 swh/model/hashutil.py               |   9 +-
 swh/model/identifiers.py            |  99 ++++++++++++++
 swh/model/model.py                  |  19 +--
 swh/model/tests/test_identifiers.py | 265 ++++++++++++++++++++++++++++++++++++
 swh/model/tests/test_model.py       |  11 +-
 5 files changed, 389 insertions(+), 14 deletions(-)
Changes applied before test
commit ff9011ffb7ce433780a99ab4d7609463c59781d7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Feb 8 15:40:21 2021 +0100

    model: Remove override of RawExtrinsicMetadata.unique_key(), so it now returns the hash.

commit 35aab964f65f25ba6c3acf161a49682d100aa450
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Feb 4 12:24:46 2021 +0100

    identifiers: Properly define the behavior of raw_extrinsic_metadata on negative timestamps.

commit 03d282b9af891b6d48e83bd5301e57c0472d5b03
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Feb 4 11:02:39 2021 +0100

    identifiers: Change the manifest format of raw_extrinsic_metadata to use integer instead of ISO8601
    
    Serializing as ISO8601 makes the hash brittle, because the database may
    change the timezone silently and/or lose precision in the microseconds.
    
    As we do not need precise timestamp, using an integer is good enough,
    and is consistant with the git format.
    
    The manifest also does not need to contain a timezone, as it only
    represents the timezone of the system that fetched this metadata,
    which is useless data.

commit 266b88dcaaa0cab48c67e62ebca51f0a4599c435
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jan 29 15:08:49 2021 +0100

    model: Add 'id' field to RawExtrinsicMetadata
    
    So that they can be properly deduplicated and referenced.

commit 272468f3b5a96c8854a26efe333c32cba4504aff
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Jan 25 12:31:12 2021 +0100

    identifiers: Add raw_extrinsic_metadata_identifier
    
    This will be used to compute an intrisic identifier for RawExtrinsicMetadata;
    which can be used for deduplication and refering to it like any other sha1_git
    instead of needed to use a tuple of its fields.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/221/ for more details.