Page MenuHomeSoftware Heritage

Add a swhid() method to all hashable objects.
ClosedPublic

Authored by vlorentz on Feb 26 2021, 3:02 PM.

Details

Summary

It can be handy as a shortcut to build SWHID objects.

Diff Detail

Repository
rDMOD Data model
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D5157 (id=18445)

Rebasing onto 24b653e4c0...

Current branch diff-target is up to date.
Changes applied before test
commit a934fd55318e9b70c4091780081796bc551566c6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Feb 26 10:49:10 2021 +0100

    Add a swhid() method to all hashable objects.
    
    It can be handy as a shortcut to build SWHID objects.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/265/ for more details.

ardumont added inline comments.
swh/model/model.py
430

why not self.unique_key() ?

737

why not self.unique_key() here?

And also why it's not sha1 as the actual unique_key implementation?

Why should I use self.unique_key() instead of self.id? it's longer

Plus, self.unique_key() is only specificed as a unique key, while self.id is the sha1_git.

swh/model/model.py
737

"for contents, the intrinsic identifier is the sha1_git hash returned by swh.model.identifiers.content_identifier()" https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html#core-identifiers

swh/model/model.py
737

thanks for the link, i need to read that back.

(note that I'm confused as to why unique_key is not aligned though with such sentence then)

Why should I use self.unique_key() instead of self.id? it's longer

what about discrepancy?

I guess i'm confused as to why we have 2 different apis for something which seem the same thing.

(i recall unique_key is having to do with serialization iirc)

And discrepancy we already have if i'm reading the content's id and unique_key implementation already.

Plus, self.unique_key() is only specificed as a unique key, while self.id is the sha1_git.

i don't get it. i'll check back the spec on swhid.

I guess i'm confused as to why we have 2 different apis for something which seem the same thing.

They are not the same thing.

SWHIDs are public identifiers, unique_key is for uniqueness inside databases.

A SWHID is always a sha1_git. unique_key can be either a sha1_git, a sha1, or a dict (for origins and REM because they didn't have a SWHID so far).

And we also want to use a dict as unique_key for contents too.

lgtm

Note that there is a repeated typo to fix prior to land it (please).

swh/model/model.py
479

representing

careful, that typo is repeated everywhere.

This revision is now accepted and ready to land.Mar 1 2021, 9:58 AM

Build is green

Patch application report for D5157 (id=18471)

Rebasing onto 24b653e4c0...

Current branch diff-target is up to date.
Changes applied before test
commit 256bca2cbafa5bc9d2a254ae2fb33ae5e091060b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Feb 26 10:49:10 2021 +0100

    Add a swhid() method to all hashable objects.
    
    It can be handy as a shortcut to build SWHID objects.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/270/ for more details.