Page MenuHomeSoftware Heritage

Move manifest computation functions from identifiers.py to git_objects.py
ClosedPublic

Authored by vlorentz on Sep 22 2021, 5:48 PM.

Details

Summary

Since they are used by the vault for non-identifier-related stuff,
I think it makes sense to move them to a new module.

identifiers.py is now an empty shell (as D6323 moved SWHID classes
out of the module), as all its features were moved to
other modules and it only contains reexports and backward-compat
functions. Therefore, it should be considered deprecated from now on.

Depends on D6325.

Diff Detail

Event Timeline

Build is green

Patch application report for D6326 (id=22980)

Could not rebase; Attempt merge onto b6f5e30b53...

Updating b6f5e30..dee94d8
Fast-forward
 swh/model/git_objects.py            | 409 ++++++++++++++++
 swh/model/identifiers.py            | 951 +-----------------------------------
 swh/model/model.py                  | 140 ++++--
 swh/model/swhids.py                 | 448 +++++++++++++++++
 swh/model/tests/test_identifiers.py |  69 ++-
 5 files changed, 1012 insertions(+), 1005 deletions(-)
 create mode 100644 swh/model/git_objects.py
 create mode 100644 swh/model/swhids.py
Changes applied before test
commit dee94d8cb90b01e2a2086076410e45f2e9514c2a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 22 17:41:15 2021 +0200

    Move manifest computation functions from identifiers.py to git_objects.py
    
    Since they are used by the vault for non-identifier-related stuff,
    I think it makes sense to move them to a new module.
    
    identifiers.py is now an empty shell, as all its features were moved to
    other modules and it only contains reexports and backward-compat
    functions. Therefore, it should be considered deprecated from now on.

commit 54e6a7fbf8c0b9f77da02d32ddccf55b4f9a389c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 22 17:32:48 2021 +0200

    Refactor identifiers & model to make *_git_object() functions work on model classes instead of dicts
    
    Since we now use these classes everywhere, computing hashes required
    using to_dict() just to compute identifiers, which can be a performance
    bottleneck in code computing many checksums.

commit 209356fc6704070bbac0b5f09b9c2e0c9aa2f12c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 22 17:19:10 2021 +0200

    test_identifiers.py: Fix/update malformed data dicts
    
    A future commit will make identifier computation use the attrs classes,
    which are strict about what they accept.

commit d08b45c8766ec82ef9b47c30e2d574e279a0e37e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 22 16:36:30 2021 +0200

    Move SWHID classes and functions from identifiers.py to swhids.py
    
    identifiers.py initially worked only on bare sha1_git.
    
    I chose to add the SWHID classes in that module because
    of the name, but the SWHID code didn't actually interact
    with the other functions in the module, so it now feels
    out of place to me.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/375/ for more details.

Looks like Phabricator is doing a funny one when displaying this diff. Lemme check what's actually in the staging repo...

It's technically correct: when I split a file, it considers it to be a copy followed by deletions.

Well except both sides show deleted documentation, which can't be right :P

Nevermind, I was confused by all the red and it's much clearer what's left over in a text editor.

Rather than live in the deprecated *_identifier functions, what do you think of moving the description of the git manifest formats to the git_objects documentation? I'm not sure how many links to these docs we have.

This revision is now accepted and ready to land.Sep 23 2021, 6:03 PM
This revision was landed with ongoing or failed builds.Sep 23 2021, 7:55 PM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D6326 (id=23021)

Could not rebase; Attempt merge onto 0dd33cdf7d...

Updating 0dd33cd..9e8a547
Fast-forward
 swh/model/git_objects.py            | 403 +++++++++++++++
 swh/model/identifiers.py            | 950 +-----------------------------------
 swh/model/model.py                  | 129 +++--
 swh/model/swhids.py                 | 448 +++++++++++++++++
 swh/model/tests/test_identifiers.py |  81 ++-
 5 files changed, 1003 insertions(+), 1008 deletions(-)
 create mode 100644 swh/model/git_objects.py
 create mode 100644 swh/model/swhids.py
Changes applied before test
commit 9e8a547622239ebf949ddbacec42e113e525c245
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 22 17:41:15 2021 +0200

    Move manifest computation functions from identifiers.py to git_objects.py
    
    Since they are used by the vault for non-identifier-related stuff,
    I think it makes sense to move them to a new module.
    
    identifiers.py is now an empty shell, as all its features were moved to
    other modules and it only contains reexports and backward-compat
    functions. Therefore, it should be considered deprecated from now on.

commit 57ae405d312879bec19107d29a20c2c290d7861d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 22 17:32:48 2021 +0200

    Refactor identifiers & model to make *_git_object() functions work on model classes instead of dicts
    
    Since we now use these classes everywhere, computing hashes required
    using to_dict() just to compute identifiers, which can be a performance
    bottleneck in code computing many checksums.

commit 6a72f88c5687f5b6c05f9b25eb73bce2d772f97c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 22 17:19:10 2021 +0200

    test_identifiers.py: Fix/update malformed data dicts
    
    A future commit will make identifier computation use the attrs classes,
    which are strict about what they accept.

commit 9ec683264c415731286005dff823e1099ef358c3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 22 16:36:30 2021 +0200

    Move SWHID classes and functions from identifiers.py to swhids.py
    
    identifiers.py initially worked only on bare sha1_git.
    
    I chose to add the SWHID classes in that module because
    of the name, but the SWHID code didn't actually interact
    with the other functions in the module, so it now feels
    out of place to me.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/387/ for more details.