Page MenuHomeSoftware Heritage

Pubdev: Do not rely on intrinsic metadata
ClosedPublic

Authored by franckbret on Oct 7 2022, 12:01 PM.

Details

Summary

The loader get enough information from extrinsic metadata to build a release object, checking intrinsic metadata was more error prone than useful.

It should fix some Sentry reported errors.

Remove 'information' and adapt release message

Adapt loader specifications documentation

Related T4465, T4530, T4583

Diff Detail

Event Timeline

Build is green

Patch application report for D8640 (id=31202)

Rebasing onto 8aa6dab72a...

Current branch diff-target is up to date.
Changes applied before test
commit deabeaf51feff0746fa00ecb2ba574b5fa02f1e6
Author: Franck Bret <franck.bret@octobus.net>
Date:   Fri Oct 7 11:51:35 2022 +0200

    Pubdev: Do not rely on intrinsic metadata
    
    The loader get enough information from extrinsic metadata to build a release object, checking intrinsic metadata was more error prone than useful.
    
    It should fix some Sentry reported errors.
    
    Remove 'information' and adapt release message
    
    Adapt loader specifications documentation
    
    Related T4465, T4530, T4583

commit 74289c868125a4d08743a1b2f00a4cd22410e1ad
Author: Franck Bret <franck.bret@octobus.net>
Date:   Wed Sep 28 16:23:45 2022 +0200

    Conda: Anaconda packages archive loader
    
    For each origin it takes advantage of 'artifacts' data send through
    'extra_loader_arguments' of the conda lister, providing versions,
    archive url, checksum, etc.
    Author and description are extracted from intrinsic metadata.
    
    Related T4579

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/972/ for more details.

Build is green

Patch application report for D8640 (id=31204)

Could not rebase; Attempt merge onto 8aa6dab72a...

Updating 8aa6dab..498ab61
Fast-forward
 docs/package-loader-specifications.rst             |  13 +-
 setup.py                                           |   1 +
 swh/loader/package/conda/__init__.py               |  17 ++
 swh/loader/package/conda/loader.py                 | 168 +++++++++++++++
 swh/loader/package/conda/tasks.py                  |  14 ++
 swh/loader/package/conda/tests/__init__.py         |   0
 swh/loader/package/conda/tests/data/fake_conda.sh  | 231 +++++++++++++++++++++
 ...inux-64_lifetimes-0.11.1-py36h9f0ad1d_1.tar.bz2 | Bin 0 -> 1742 bytes
 ...inux-64_lifetimes-0.11.1-py36hc560c46_1.tar.bz2 | Bin 0 -> 1286 bytes
 swh/loader/package/conda/tests/test_conda.py       | 133 ++++++++++++
 swh/loader/package/conda/tests/test_tasks.py       |  24 +++
 swh/loader/package/pubdev/loader.py                |  67 ++----
 swh/loader/package/pubdev/tests/test_pubdev.py     |  21 +-
 13 files changed, 621 insertions(+), 68 deletions(-)
 create mode 100644 swh/loader/package/conda/__init__.py
 create mode 100644 swh/loader/package/conda/loader.py
 create mode 100644 swh/loader/package/conda/tasks.py
 create mode 100644 swh/loader/package/conda/tests/__init__.py
 create mode 100644 swh/loader/package/conda/tests/data/fake_conda.sh
 create mode 100644 swh/loader/package/conda/tests/data/https_conda.anaconda.org/conda-forge_linux-64_lifetimes-0.11.1-py36h9f0ad1d_1.tar.bz2
 create mode 100644 swh/loader/package/conda/tests/data/https_conda.anaconda.org/conda-forge_linux-64_lifetimes-0.11.1-py36hc560c46_1.tar.bz2
 create mode 100644 swh/loader/package/conda/tests/test_conda.py
 create mode 100644 swh/loader/package/conda/tests/test_tasks.py
Changes applied before test
commit 498ab615f4ec305390b58f93f7894311042f332e
Author: Franck Bret <franck.bret@octobus.net>
Date:   Fri Oct 7 11:51:35 2022 +0200

    Pubdev: Do not rely on intrinsic metadata
    
    The loader get enough information from extrinsic metadata to build a release object, checking intrinsic metadata was more error prone than useful.
    
    It should fix some Sentry reported errors.
    
    Remove 'information' and adapt release message
    
    Adapt loader specifications documentation
    
    Related T4465, T4530, T4583

commit 74289c868125a4d08743a1b2f00a4cd22410e1ad
Author: Franck Bret <franck.bret@octobus.net>
Date:   Wed Sep 28 16:23:45 2022 +0200

    Conda: Anaconda packages archive loader
    
    For each origin it takes advantage of 'artifacts' data send through
    'extra_loader_arguments' of the conda lister, providing versions,
    archive url, checksum, etc.
    Author and description are extracted from intrinsic metadata.
    
    Related T4579

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/973/ for more details.

vlorentz added inline comments.
swh/loader/package/pubdev/loader.py
116–118

what is the purpose of any(authors)? is it possible to have empty author names in the list?

swh/loader/package/pubdev/loader.py
116–118

Can"t remember, maybe when doing docker I met list with empty string

swh/loader/package/pubdev/loader.py
116–118

then you shouldn't unconditionally take the first string if it may be empty.

(and please add a regression test)

anlambert added inline comments.
swh/loader/package/pubdev/loader.py
106

As we are querying the pubdev Web API for getting package info, I think we should store the JSON data associated to a specific version as extrinsic metadata.
Nevertheless, this is out of scope for that diff and should be handled in a new one.

franckbret marked an inline comment as done.

Remove useless check condition when getttinh 'atuhor' data

Build is green

Patch application report for D8640 (id=31238)

Could not rebase; Attempt merge onto 028b7c04b9...

Updating 028b7c0..350f632
Fast-forward
 docs/package-loader-specifications.rst             |  13 +-
 setup.py                                           |   1 +
 swh/loader/package/conda/__init__.py               |  17 ++
 swh/loader/package/conda/loader.py                 | 168 +++++++++++++++
 swh/loader/package/conda/tasks.py                  |  14 ++
 swh/loader/package/conda/tests/__init__.py         |   0
 swh/loader/package/conda/tests/data/fake_conda.sh  | 231 +++++++++++++++++++++
 ...inux-64_lifetimes-0.11.1-py36h9f0ad1d_1.tar.bz2 | Bin 0 -> 1742 bytes
 ...inux-64_lifetimes-0.11.1-py36hc560c46_1.tar.bz2 | Bin 0 -> 1286 bytes
 swh/loader/package/conda/tests/test_conda.py       | 133 ++++++++++++
 swh/loader/package/conda/tests/test_tasks.py       |  24 +++
 swh/loader/package/pubdev/loader.py                |  67 ++----
 swh/loader/package/pubdev/tests/test_pubdev.py     |  21 +-
 13 files changed, 621 insertions(+), 68 deletions(-)
 create mode 100644 swh/loader/package/conda/__init__.py
 create mode 100644 swh/loader/package/conda/loader.py
 create mode 100644 swh/loader/package/conda/tasks.py
 create mode 100644 swh/loader/package/conda/tests/__init__.py
 create mode 100644 swh/loader/package/conda/tests/data/fake_conda.sh
 create mode 100644 swh/loader/package/conda/tests/data/https_conda.anaconda.org/conda-forge_linux-64_lifetimes-0.11.1-py36h9f0ad1d_1.tar.bz2
 create mode 100644 swh/loader/package/conda/tests/data/https_conda.anaconda.org/conda-forge_linux-64_lifetimes-0.11.1-py36hc560c46_1.tar.bz2
 create mode 100644 swh/loader/package/conda/tests/test_conda.py
 create mode 100644 swh/loader/package/conda/tests/test_tasks.py
Changes applied before test
commit 350f632e412dc1434bd85e96033220e16cd9bd43
Author: Franck Bret <franck.bret@octobus.net>
Date:   Fri Oct 7 11:51:35 2022 +0200

    Pubdev: Do not rely on intrinsic metadata
    
    The loader get enough information from extrinsic metadata to build a release object, checking intrinsic metadata was more error prone than useful.
    
    It should fix some Sentry reported errors.
    
    Remove 'information' and adapt release message
    
    Adapt loader specifications documentation
    
    Related T4465, T4530, T4583

commit 4e067fd726a124a85d7ab24a2dc584932a82c151
Author: Franck Bret <franck.bret@octobus.net>
Date:   Wed Sep 28 16:23:45 2022 +0200

    Conda: Anaconda packages archive loader
    
    For each origin it takes advantage of 'artifacts' data send through
    'extra_loader_arguments' of the conda lister, providing versions,
    archive url, checksum, etc.
    Author and description are extracted from intrinsic metadata.
    
    Related T4579

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/985/ for more details.

Remove useless check condition when getttinh 'atuhor' data

You checked there is no empty string in the input data, right?

swh/loader/package/pubdev/loader.py
116–118

I've runned a script to check the whole data set and did not found a case where authors is not a list or is a list with empty strings.

DEBUG:10/11/2022 07:05:37 AM:Found 34924 packages, 298689 versions, 82619 author, 11462 authors, 204608 empty author

So using Any here is useless for sure.

Remove useless check condition when getttinh 'atuhor' data

You checked there is no empty string in the input data, right?

yes

This revision is now accepted and ready to land.Oct 11 2022, 10:25 AM
franckbret added inline comments.
swh/loader/package/pubdev/loader.py
106

yep sure, will add raw_extrinsic metadata in another patch

franckbret marked an inline comment as done.

rebase

Build is green

Patch application report for D8640 (id=31241)

Could not rebase; Attempt merge onto 028b7c04b9...

Updating 028b7c0..350f632
Fast-forward
 docs/package-loader-specifications.rst             |  13 +-
 setup.py                                           |   1 +
 swh/loader/package/conda/__init__.py               |  17 ++
 swh/loader/package/conda/loader.py                 | 168 +++++++++++++++
 swh/loader/package/conda/tasks.py                  |  14 ++
 swh/loader/package/conda/tests/__init__.py         |   0
 swh/loader/package/conda/tests/data/fake_conda.sh  | 231 +++++++++++++++++++++
 ...inux-64_lifetimes-0.11.1-py36h9f0ad1d_1.tar.bz2 | Bin 0 -> 1742 bytes
 ...inux-64_lifetimes-0.11.1-py36hc560c46_1.tar.bz2 | Bin 0 -> 1286 bytes
 swh/loader/package/conda/tests/test_conda.py       | 133 ++++++++++++
 swh/loader/package/conda/tests/test_tasks.py       |  24 +++
 swh/loader/package/pubdev/loader.py                |  67 ++----
 swh/loader/package/pubdev/tests/test_pubdev.py     |  21 +-
 13 files changed, 621 insertions(+), 68 deletions(-)
 create mode 100644 swh/loader/package/conda/__init__.py
 create mode 100644 swh/loader/package/conda/loader.py
 create mode 100644 swh/loader/package/conda/tasks.py
 create mode 100644 swh/loader/package/conda/tests/__init__.py
 create mode 100644 swh/loader/package/conda/tests/data/fake_conda.sh
 create mode 100644 swh/loader/package/conda/tests/data/https_conda.anaconda.org/conda-forge_linux-64_lifetimes-0.11.1-py36h9f0ad1d_1.tar.bz2
 create mode 100644 swh/loader/package/conda/tests/data/https_conda.anaconda.org/conda-forge_linux-64_lifetimes-0.11.1-py36hc560c46_1.tar.bz2
 create mode 100644 swh/loader/package/conda/tests/test_conda.py
 create mode 100644 swh/loader/package/conda/tests/test_tasks.py
Changes applied before test
commit 350f632e412dc1434bd85e96033220e16cd9bd43
Author: Franck Bret <franck.bret@octobus.net>
Date:   Fri Oct 7 11:51:35 2022 +0200

    Pubdev: Do not rely on intrinsic metadata
    
    The loader get enough information from extrinsic metadata to build a release object, checking intrinsic metadata was more error prone than useful.
    
    It should fix some Sentry reported errors.
    
    Remove 'information' and adapt release message
    
    Adapt loader specifications documentation
    
    Related T4465, T4530, T4583

commit 4e067fd726a124a85d7ab24a2dc584932a82c151
Author: Franck Bret <franck.bret@octobus.net>
Date:   Wed Sep 28 16:23:45 2022 +0200

    Conda: Anaconda packages archive loader
    
    For each origin it takes advantage of 'artifacts' data send through
    'extra_loader_arguments' of the conda lister, providing versions,
    archive url, checksum, etc.
    Author and description are extracted from intrinsic metadata.
    
    Related T4579

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/986/ for more details.

Build is green

Patch application report for D8640 (id=31243)

Could not rebase; Attempt merge onto 028b7c04b9...

Updating 028b7c0..99cec3e
Fast-forward
 docs/package-loader-specifications.rst             |  13 +-
 setup.py                                           |   1 +
 swh/loader/package/conda/__init__.py               |  17 ++
 swh/loader/package/conda/loader.py                 | 168 +++++++++++++++
 swh/loader/package/conda/tasks.py                  |  14 ++
 swh/loader/package/conda/tests/__init__.py         |   0
 swh/loader/package/conda/tests/data/fake_conda.sh  | 231 +++++++++++++++++++++
 ...inux-64_lifetimes-0.11.1-py36h9f0ad1d_1.tar.bz2 | Bin 0 -> 1742 bytes
 ...inux-64_lifetimes-0.11.1-py36hc560c46_1.tar.bz2 | Bin 0 -> 1286 bytes
 swh/loader/package/conda/tests/test_conda.py       | 133 ++++++++++++
 swh/loader/package/conda/tests/test_tasks.py       |  24 +++
 swh/loader/package/pubdev/loader.py                |  67 ++----
 swh/loader/package/pubdev/tests/test_pubdev.py     |  21 +-
 13 files changed, 621 insertions(+), 68 deletions(-)
 create mode 100644 swh/loader/package/conda/__init__.py
 create mode 100644 swh/loader/package/conda/loader.py
 create mode 100644 swh/loader/package/conda/tasks.py
 create mode 100644 swh/loader/package/conda/tests/__init__.py
 create mode 100644 swh/loader/package/conda/tests/data/fake_conda.sh
 create mode 100644 swh/loader/package/conda/tests/data/https_conda.anaconda.org/conda-forge_linux-64_lifetimes-0.11.1-py36h9f0ad1d_1.tar.bz2
 create mode 100644 swh/loader/package/conda/tests/data/https_conda.anaconda.org/conda-forge_linux-64_lifetimes-0.11.1-py36hc560c46_1.tar.bz2
 create mode 100644 swh/loader/package/conda/tests/test_conda.py
 create mode 100644 swh/loader/package/conda/tests/test_tasks.py
Changes applied before test
commit 99cec3e03357a5bd75a2b2012299749540f5cc7f
Author: Franck Bret <franck.bret@octobus.net>
Date:   Fri Oct 7 11:51:35 2022 +0200

    Pubdev: Do not rely on intrinsic metadata
    
    The loader get enough information from extrinsic metadata to build a release object, checking intrinsic metadata was more error prone than useful.
    
    It should fix some Sentry reported errors.
    
    Remove 'information' and adapt release message
    
    Adapt loader specifications documentation
    
    Related T4465, T4530, T4583

commit 4e067fd726a124a85d7ab24a2dc584932a82c151
Author: Franck Bret <franck.bret@octobus.net>
Date:   Wed Sep 28 16:23:45 2022 +0200

    Conda: Anaconda packages archive loader
    
    For each origin it takes advantage of 'artifacts' data send through
    'extra_loader_arguments' of the conda lister, providing versions,
    archive url, checksum, etc.
    Author and description are extracted from intrinsic metadata.
    
    Related T4579

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/987/ for more details.

This revision was landed with ongoing or failed builds.Oct 11 2022, 12:59 PM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D8640 (id=31247)

Rebasing onto 028b7c04b9...

Current branch diff-target is up to date.
Changes applied before test
commit 4cb85e153e2ef7ed1fb51c16be324ea1568df30a
Author: Franck Bret <franck.bret@octobus.net>
Date:   Fri Oct 7 11:51:35 2022 +0200

    Pubdev: Do not rely on intrinsic metadata
    
    The loader get enough information from extrinsic metadata to build a release object, checking intrinsic metadata was more error prone than useful.
    
    It should fix some Sentry reported errors.
    
    Remove 'information' and adapt release message
    
    Adapt loader specifications documentation
    
    Related T4465, T4530, T4583

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/988/ for more details.