Page MenuHomeSoftware Heritage

Use attributes of *PackageInfo objects instead of untyped dicts.
ClosedPublic

Authored by vlorentz on Jul 23 2020, 2:58 PM.

Details

Summary

This commit does the following:

  • Move artifact_identity to BasePackageInfo, which uses a class attribute (and is overriden for ArchivePackageInfo, which needs a custom behavior to override keys). Also moved/improved its test
  • Add attributes to *PackageInfo classes, that can be accessed instead of the raw metadata.
  • Add a from_metadata class method to all *PackageInfo classes, to parse the raw metadata and build the object from it.
  • Pass the PackageInfo object to resolve_revision_from and build_revision instead of untyped dicts.

Depends on D3601 and D3602

Suggested reading order:

  • first, swh/loader/package/loader.py
  • then each loader but debian and archive (they have more extensive changes)
  • then archive and debian
Test Plan

Functional tests are not affected. Unit tests of functions formerly working on dicts are updated to work with the new attr classes

Diff Detail

Repository
rDLDBASE Generic VCS/Package Loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

vlorentz edited the summary of this revision. (Show Details)

Build has FAILED

Patch application report for D3603 (id=12689)

Could not rebase; Attempt merge onto 2699a1f629...

Updating 2699a1f..a8a33e5
Fast-forward
 swh/loader/cli.py                                |  10 +-
 swh/loader/package/archive/loader.py             |  89 +++++++----
 swh/loader/package/archive/tasks.py              |   6 +-
 swh/loader/package/archive/tests/test_archive.py |  36 ++++-
 swh/loader/package/cran/loader.py                |  59 ++++---
 swh/loader/package/debian/loader.py              | 194 +++++++++++++++--------
 swh/loader/package/debian/tests/test_debian.py   | 110 +++++++------
 swh/loader/package/deposit/loader.py             | 103 ++++++++----
 swh/loader/package/loader.py                     |  66 ++++++--
 swh/loader/package/nixguix/loader.py             |  40 +++--
 swh/loader/package/nixguix/tests/test_nixguix.py |  13 +-
 swh/loader/package/npm/loader.py                 |  89 +++++++----
 swh/loader/package/npm/tests/test_npm.py         |  18 +--
 swh/loader/package/pypi/loader.py                |  56 ++++---
 swh/loader/package/pypi/tests/test_pypi.py       |  40 ++---
 swh/loader/package/tests/test_loader.py          |  33 +++-
 swh/loader/package/tests/test_utils.py           |  23 +--
 swh/loader/package/utils.py                      |  17 +-
 18 files changed, 639 insertions(+), 363 deletions(-)
Changes applied before test
commit a8a33e5aa22d09b6c8e23d6d173daeba650742b5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jul 23 14:56:28 2020 +0200

    Use attributes of *PackageInfo objects instead of untyped dicts.
    
    This commit does the following:
    
    * Move artifact_identity to BasePackageInfo, which uses a class attribute
      (and is overriden for ArchivePackageInfo, which needs a custom behavior
      to override keys). Also moved/improved its test
    * Add attributes to *PackageInfo classes, that can be accessed instead
      of the raw metadata.
    * Add a from_metadata class method to all *PackageInfo classes, to parse
      the raw metadata and build the object from it.
    * Pass the PackageInfo object to resolve_revision_from and build_revision
      instead of untyped dicts.

commit 68902739138ba4ce282572dd66e55c048eaa1310
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 22 11:58:32 2020 +0200

    Use an attr class for p_info instead of a dict.
    
    The benefits are minimal for now, as 'raw' still contains a lot of stuff;
    but further commits will move data out of 'raw' to a proper attribute.

commit 1fbbf14ed01e6fc9852f9b54f1b60b14f19a5515
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jul 23 13:31:09 2020 +0200

    test_resolve_revision_from_edge_cases: fix argument names/order.

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/208/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/208/console

Build is green

Patch application report for D3603 (id=12691)

Could not rebase; Attempt merge onto 2699a1f629...

Updating 2699a1f..90fd747
Fast-forward
 swh/loader/cli.py                                |  10 +-
 swh/loader/package/archive/loader.py             |  89 +++++++----
 swh/loader/package/archive/tasks.py              |   6 +-
 swh/loader/package/archive/tests/test_archive.py |  36 ++++-
 swh/loader/package/cran/loader.py                |  59 ++++---
 swh/loader/package/debian/loader.py              | 194 +++++++++++++++--------
 swh/loader/package/debian/tests/test_debian.py   | 110 +++++++------
 swh/loader/package/deposit/loader.py             | 103 ++++++++----
 swh/loader/package/loader.py                     |  66 ++++++--
 swh/loader/package/nixguix/loader.py             |  40 +++--
 swh/loader/package/nixguix/tests/test_nixguix.py |  13 +-
 swh/loader/package/npm/loader.py                 |  89 +++++++----
 swh/loader/package/npm/tests/test_npm.py         |  18 +--
 swh/loader/package/pypi/loader.py                |  56 ++++---
 swh/loader/package/pypi/tests/test_pypi.py       |  40 ++---
 swh/loader/package/tests/test_loader.py          |  33 +++-
 swh/loader/package/tests/test_utils.py           |  23 +--
 swh/loader/package/utils.py                      |  17 +-
 18 files changed, 639 insertions(+), 363 deletions(-)
Changes applied before test
commit 90fd7477814ca93a3936c11f82c140c4bf494960
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jul 23 14:56:28 2020 +0200

    Use attributes of *PackageInfo objects instead of untyped dicts.
    
    This commit does the following:
    
    * Move artifact_identity to BasePackageInfo, which uses a class attribute
      (and is overriden for ArchivePackageInfo, which needs a custom behavior
      to override keys). Also moved/improved its test
    * Add attributes to *PackageInfo classes, that can be accessed instead
      of the raw metadata.
    * Add a from_metadata class method to all *PackageInfo classes, to parse
      the raw metadata and build the object from it.
    * Pass the PackageInfo object to resolve_revision_from and build_revision
      instead of untyped dicts.

commit 68902739138ba4ce282572dd66e55c048eaa1310
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 22 11:58:32 2020 +0200

    Use an attr class for p_info instead of a dict.
    
    The benefits are minimal for now, as 'raw' still contains a lot of stuff;
    but further commits will move data out of 'raw' to a proper attribute.

commit 1fbbf14ed01e6fc9852f9b54f1b60b14f19a5515
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jul 23 13:31:09 2020 +0200

    test_resolve_revision_from_edge_cases: fix argument names/order.

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/209/ for more details.

ardumont added inline comments.
swh/loader/package/cran/loader.py
99

I started reviewing as you suggested (that's only the first package loader I read after the the loader.package one).
So I did not read the rest yet...

Though i'm wondering if this can't go one level up now that you abstracted away the artifact identity computation...
Theoretically, in my mind at least ;), that should work, shouldn't it?

swh/loader/package/cran/loader.py
99

That crossed my mind, but Debian does it differently, and PyPI and NPM do it in another different way, so I'd rather keep it this way, at least for now

Nice ;)

Some comments mostly because of the conversions which happen to multiple locations... (mostly in part in build_revision and another part in the <Loader>PackageInfo).

Since we are typing now, i think moving the conversion to the <Loader>PackageInfo with the right type sounds sensible. Then build_revision simply read data out of the PackageInfo.
What do you think?

Aside from that, that looks pretty cool.

Thanks.

swh/loader/package/cran/loader.py
99

agreed ;)

swh/loader/package/debian/loader.py
251

I recall you can drop the extra enclosing brackets.

swh/loader/package/deposit/loader.py
138

Maybe centralize the conversion at the same location.

Here you are converting one part into the package info class (parent ids to bytes), the other in the build revision (timestamp stuff).
Might be convert everything in the package info class (same goes for other loaders).

swh/loader/package/pypi/tests/test_pypi.py
745

i don't think the noqa is needed anymore ;)

826

why does that go away?

Build is green

Patch application report for D3603 (id=12694)

Could not rebase; Attempt merge onto 2699a1f629...

Updating 2699a1f..41d9a2a
Fast-forward
 swh/loader/cli.py                                |  10 +-
 swh/loader/package/archive/loader.py             |  89 ++++++----
 swh/loader/package/archive/tasks.py              |   6 +-
 swh/loader/package/archive/tests/test_archive.py |  36 ++++-
 swh/loader/package/cran/loader.py                |  63 ++++----
 swh/loader/package/debian/loader.py              | 196 +++++++++++++++--------
 swh/loader/package/debian/tests/test_debian.py   | 110 +++++++------
 swh/loader/package/deposit/loader.py             | 103 ++++++++----
 swh/loader/package/loader.py                     |  68 ++++++--
 swh/loader/package/nixguix/loader.py             |  38 +++--
 swh/loader/package/nixguix/tests/test_nixguix.py |  13 +-
 swh/loader/package/npm/loader.py                 |  89 ++++++----
 swh/loader/package/npm/tests/test_npm.py         |  18 +--
 swh/loader/package/pypi/loader.py                |  58 ++++---
 swh/loader/package/pypi/tests/test_pypi.py       |  40 ++---
 swh/loader/package/tests/test_loader.py          |  33 +++-
 swh/loader/package/tests/test_utils.py           |  23 +--
 swh/loader/package/utils.py                      |  17 +-
 18 files changed, 638 insertions(+), 372 deletions(-)
Changes applied before test
commit 41d9a2a42af457de81c87aca68369706a929d5a9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jul 23 14:56:28 2020 +0200

    Use attributes of *PackageInfo objects instead of untyped dicts.
    
    This commit does the following:
    
    * Move artifact_identity to BasePackageInfo, which uses a class attribute
      (and is overriden for ArchivePackageInfo, which needs a custom behavior
      to override keys). Also moved/improved its test
    * Add attributes to *PackageInfo classes, that can be accessed instead
      of the raw metadata.
    * Add a from_metadata class method to all *PackageInfo classes, to parse
      the raw metadata and build the object from it.
    * Pass the PackageInfo object to resolve_revision_from and build_revision
      instead of untyped dicts.

commit 1050d94d871d7f46deb18aa46cc07f95f3932e92
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 22 11:58:32 2020 +0200

    Use an attr class for p_info instead of a dict.
    
    The benefits are minimal for now, as 'raw' still contains a lot of stuff;
    but further commits will move data out of 'raw' to a proper attribute.

commit 1fbbf14ed01e6fc9852f9b54f1b60b14f19a5515
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jul 23 13:31:09 2020 +0200

    test_resolve_revision_from_edge_cases: fix argument names/order.

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/211/ for more details.

vlorentz added inline comments.
swh/loader/package/deposit/loader.py
138

That's what I did initially, but it makes more sense here, as I'm not using model objects in the *PackageInfo structures; and TimestampWithTimezone is only really useful for making the Revision.

swh/loader/package/pypi/tests/test_pypi.py
826

when the metadata is empty, PackageInfo.from_metadata crashes so artifact_to_revision_id can't even be called with empty metadata

Build is green

Patch application report for D3603 (id=12695)

Could not rebase; Attempt merge onto 2699a1f629...

Updating 2699a1f..14f700d
Fast-forward
 swh/loader/cli.py                                |  10 +-
 swh/loader/package/archive/loader.py             |  89 ++++++----
 swh/loader/package/archive/tasks.py              |   6 +-
 swh/loader/package/archive/tests/test_archive.py |  36 ++++-
 swh/loader/package/cran/loader.py                |  63 ++++----
 swh/loader/package/debian/loader.py              | 196 +++++++++++++++--------
 swh/loader/package/debian/tests/test_debian.py   | 110 +++++++------
 swh/loader/package/deposit/loader.py             | 103 ++++++++----
 swh/loader/package/loader.py                     |  68 ++++++--
 swh/loader/package/nixguix/loader.py             |  38 +++--
 swh/loader/package/nixguix/tests/test_nixguix.py |  13 +-
 swh/loader/package/npm/loader.py                 |  89 ++++++----
 swh/loader/package/npm/tests/test_npm.py         |  18 +--
 swh/loader/package/pypi/loader.py                |  58 ++++---
 swh/loader/package/pypi/tests/test_pypi.py       |  38 ++---
 swh/loader/package/tests/test_loader.py          |  33 +++-
 swh/loader/package/tests/test_utils.py           |  23 +--
 swh/loader/package/utils.py                      |  17 +-
 18 files changed, 636 insertions(+), 372 deletions(-)
Changes applied before test
commit 14f700d0fa5b5b48da04d91a10d55cf16fa91030
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jul 23 14:56:28 2020 +0200

    Use attributes of *PackageInfo objects instead of untyped dicts.
    
    This commit does the following:
    
    * Move artifact_identity to BasePackageInfo, which uses a class attribute
      (and is overriden for ArchivePackageInfo, which needs a custom behavior
      to override keys). Also moved/improved its test
    * Add attributes to *PackageInfo classes, that can be accessed instead
      of the raw metadata.
    * Add a from_metadata class method to all *PackageInfo classes, to parse
      the raw metadata and build the object from it.
    * Pass the PackageInfo object to resolve_revision_from and build_revision
      instead of untyped dicts.

commit 1050d94d871d7f46deb18aa46cc07f95f3932e92
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 22 11:58:32 2020 +0200

    Use an attr class for p_info instead of a dict.
    
    The benefits are minimal for now, as 'raw' still contains a lot of stuff;
    but further commits will move data out of 'raw' to a proper attribute.

commit 1fbbf14ed01e6fc9852f9b54f1b60b14f19a5515
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jul 23 13:31:09 2020 +0200

    test_resolve_revision_from_edge_cases: fix argument names/order.

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/212/ for more details.

First pass, i'm ok with it so accepted as me.
I'll make another pass on it tomorrow and if i'm still fine with it, i'll accept as reviewers.

In the mean time, if someone wants to take a stab at it, it's also fine ;)

Cheers,

This revision is now accepted and ready to land.Jul 23 2020, 5:57 PM