Page MenuHomeSoftware Heritage

package: Mark visit as not_found when relevant
ClosedPublic

Authored by ardumont on Feb 5 2021, 5:49 PM.

Details

Summary

If any errors happen during the communication with the origin to retrieve package
information, the visit fails and its status is marked as not_found.

Only pypi, npm and nixguix package loaders are impacted. The other loaders do not read
anything from their url so there is no way to trigger such possibility.

Note that the nixguix loader got refactored to avoid the side-effect of reading data out
of the url within the constructor. It was necessary so the check fails and the visit
status is dealt with as described. Also, this unifies it with how pypi and npm loaders
deals with the url communication.

Related to T3030
Depends on D5026

Diff Detail

Repository
rDLDBASE Generic VCS/Package Loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D5035 (id=17946)

Could not rebase; Attempt merge onto 65a513578d...

Updating 65a5135..e4aced8
Fast-forward
 swh/loader/core/loader.py                        |  4 +-
 swh/loader/core/tests/test_loader.py             | 31 +++++++++-----
 swh/loader/exception.py                          |  8 ++++
 swh/loader/package/cran/tests/test_cran.py       | 53 +++++++++++++++++++++++-
 swh/loader/package/loader.py                     | 28 +++++++++----
 swh/loader/package/nixguix/loader.py             | 53 ++++++++++++++----------
 swh/loader/package/nixguix/tests/test_nixguix.py | 32 ++++++++++----
 swh/loader/package/npm/loader.py                 |  8 +++-
 swh/loader/package/npm/tests/test_npm.py         | 15 ++++++-
 swh/loader/package/pypi/loader.py                |  8 +++-
 swh/loader/package/pypi/tests/test_pypi.py       | 45 +++++++++++++++++++-
 11 files changed, 226 insertions(+), 59 deletions(-)
 create mode 100644 swh/loader/exception.py
Changes applied before test
commit e4aced836af1075078389d0642050ebd67624d6d
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 17:43:24 2021 +0100

    package: Mark visit as not_found when relevant
    
    Related to T3030

commit e2ea9373b0f554f9eb2f25d5a9869aa98b447b1f
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 12:59:29 2021 +0100

    core: Mark visit status as failed when relevant
    
    When a visit ends up with no snapshot within the exception block, mark the visit as
    failed. Otherwise, let it be a partial visit.
    
    Related to T3030

commit 3b4a023468714a23f42eba4ef4e848ca209cc6b0
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 12:25:12 2021 +0100

    package: Mark visit status as failed when relevant
    
    When:
    - failure to communicate internally with the storage
    - absolutely no revision got loaded during a visit
    
    Related to T3030

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/351/ for more details.

swh/loader/package/npm/loader.py
106

That behavior may need to be pushed to the api_info function directly to avoid repetition.

Build is green

Patch application report for D5035 (id=17949)

Could not rebase; Attempt merge onto 65a513578d...

Updating 65a5135..0d23586
Fast-forward
 swh/loader/core/loader.py                        |  4 +-
 swh/loader/core/tests/test_loader.py             | 31 +++++++++-----
 swh/loader/exception.py                          |  8 ++++
 swh/loader/package/cran/tests/test_cran.py       | 53 +++++++++++++++++++++++-
 swh/loader/package/loader.py                     | 24 ++++++++---
 swh/loader/package/nixguix/loader.py             | 53 ++++++++++++++----------
 swh/loader/package/nixguix/tests/test_nixguix.py | 32 ++++++++++----
 swh/loader/package/npm/loader.py                 |  8 +++-
 swh/loader/package/npm/tests/test_npm.py         | 15 ++++++-
 swh/loader/package/pypi/loader.py                |  8 +++-
 swh/loader/package/pypi/tests/test_pypi.py       | 45 +++++++++++++++++++-
 11 files changed, 224 insertions(+), 57 deletions(-)
 create mode 100644 swh/loader/exception.py
Changes applied before test
commit 0d23586a3ea81415dd0d87fdc81b71ac5afde1c6
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 17:43:24 2021 +0100

    package: Mark visit as not_found when relevant
    
    Related to T3030

commit 0673a2eb060c81a0293f8bc0548f5c8e3cde90c3
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 12:59:29 2021 +0100

    core: Mark visit status as failed when relevant
    
    When a visit ends up with no snapshot within the exception block, mark the visit as
    failed. Otherwise, let it be a partial visit.
    
    Related to T3030

commit 4500063a2fc5467fdd3ee2c0da25fa70a1fd919a
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 12:25:12 2021 +0100

    package: Mark visit status as failed when relevant
    
    When:
    - failure to communicate internally with the storage
    - absolutely no revision got loaded during a visit
    
    Related to T3030

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/354/ for more details.

vlorentz added inline comments.
swh/loader/exception.py
7

docstring plz

swh/loader/package/loader.py
426–429

Why just ValueError?

swh/loader/package/nixguix/loader.py
221

again, why just ValueError?

swh/loader/package/nixguix/loader.py
221

no idea, because that's what's raised currently?

What do you imply? is Exception enough?

swh/loader/package/nixguix/loader.py
221

context: api_info raises only ValueError.

Build is green

Patch application report for D5035 (id=17960)

Could not rebase; Attempt merge onto 4500063a2f...

Updating 4500063..c652257
Fast-forward
 swh/loader/core/loader.py                        |  4 +-
 swh/loader/core/tests/test_loader.py             | 31 +++++++++-----
 swh/loader/exception.py                          | 12 ++++++
 swh/loader/package/loader.py                     | 14 ++++++-
 swh/loader/package/nixguix/loader.py             | 53 ++++++++++++++----------
 swh/loader/package/nixguix/tests/test_nixguix.py | 32 ++++++++++----
 swh/loader/package/npm/loader.py                 |  8 +++-
 swh/loader/package/npm/tests/test_npm.py         | 11 +++++
 swh/loader/package/pypi/loader.py                |  8 +++-
 swh/loader/package/pypi/tests/test_pypi.py       | 11 +++++
 10 files changed, 137 insertions(+), 47 deletions(-)
 create mode 100644 swh/loader/exception.py
Changes applied before test
commit c652257f1915c1d4275c4f91d2447c9757b837e3
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 17:43:24 2021 +0100

    package: Mark visit as not_found when relevant
    
    Related to T3030

commit 0673a2eb060c81a0293f8bc0548f5c8e3cde90c3
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 12:59:29 2021 +0100

    core: Mark visit status as failed when relevant
    
    When a visit ends up with no snapshot within the exception block, mark the visit as
    failed. Otherwise, let it be a partial visit.
    
    Related to T3030

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/355/ for more details.

Adapt according to my own remark on updating the api_info implementation.
So now, this is the function raising the NotFound exception.

Build is green

Patch application report for D5035 (id=17961)

Could not rebase; Attempt merge onto 4500063a2f...

Updating 4500063..a389dc2
Fast-forward
 swh/loader/core/loader.py                        |  4 +-
 swh/loader/core/tests/test_loader.py             | 31 +++++++++++-----
 swh/loader/exception.py                          | 12 ++++++
 swh/loader/package/loader.py                     | 14 ++++++-
 swh/loader/package/nixguix/loader.py             | 47 ++++++++++++++----------
 swh/loader/package/nixguix/tests/test_nixguix.py | 32 +++++++++++-----
 swh/loader/package/npm/loader.py                 |  2 +-
 swh/loader/package/npm/tests/test_npm.py         | 11 ++++++
 swh/loader/package/pypi/loader.py                |  2 +-
 swh/loader/package/pypi/tests/test_pypi.py       | 11 ++++++
 swh/loader/package/tests/test_utils.py           |  5 ++-
 swh/loader/package/utils.py                      |  7 ++--
 12 files changed, 129 insertions(+), 49 deletions(-)
 create mode 100644 swh/loader/exception.py
Changes applied before test
commit a389dc20edaf500ff5d27516bc9b4a4556dd554d
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 17:43:24 2021 +0100

    package: Mark visit as not_found when relevant
    
    Related to T3030

commit 0673a2eb060c81a0293f8bc0548f5c8e3cde90c3
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 12:59:29 2021 +0100

    core: Mark visit status as failed when relevant
    
    When a visit ends up with no snapshot within the exception block, mark the visit as
    failed. Otherwise, let it be a partial visit.
    
    Related to T3030

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/356/ for more details.

Build is green

Patch application report for D5035 (id=17962)

Could not rebase; Attempt merge onto 4500063a2f...

Updating 4500063..adea924
Fast-forward
 swh/loader/core/loader.py                        |  4 +-
 swh/loader/core/tests/test_loader.py             | 31 +++++++++++-----
 swh/loader/exception.py                          | 13 +++++++
 swh/loader/package/loader.py                     | 14 ++++++-
 swh/loader/package/nixguix/loader.py             | 47 ++++++++++++++----------
 swh/loader/package/nixguix/tests/test_nixguix.py | 32 +++++++++++-----
 swh/loader/package/npm/loader.py                 |  2 +-
 swh/loader/package/npm/tests/test_npm.py         | 11 ++++++
 swh/loader/package/pypi/loader.py                |  2 +-
 swh/loader/package/pypi/tests/test_pypi.py       | 11 ++++++
 swh/loader/package/tests/test_utils.py           |  5 ++-
 swh/loader/package/utils.py                      |  7 ++--
 12 files changed, 130 insertions(+), 49 deletions(-)
 create mode 100644 swh/loader/exception.py
Changes applied before test
commit adea9242fa951495070a7744b35f0d7da591bc60
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 17:43:24 2021 +0100

    package: Mark visit as not_found when relevant
    
    Related to T3030

commit 0673a2eb060c81a0293f8bc0548f5c8e3cde90c3
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 12:59:29 2021 +0100

    core: Mark visit status as failed when relevant
    
    When a visit ends up with no snapshot within the exception block, mark the visit as
    failed. Otherwise, let it be a partial visit.
    
    Related to T3030

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/357/ for more details.

context: api_info raises only ValueError.

I see. It should raise its own exception IMO; ValueError is way too generic.

swh/loader/package/nixguix/loader.py
221

yes, and i did that (following one my own comment below).

Build is green

Patch application report for D5035 (id=17969)

Could not rebase; Attempt merge onto 4500063a2f...

Updating 4500063..1756bc0
Fast-forward
 swh/loader/core/loader.py                        |  8 ++--
 swh/loader/core/tests/test_loader.py             | 31 +++++++++++-----
 swh/loader/exception.py                          | 13 +++++++
 swh/loader/package/loader.py                     | 14 ++++++-
 swh/loader/package/nixguix/loader.py             | 47 ++++++++++++++----------
 swh/loader/package/nixguix/tests/test_nixguix.py | 32 +++++++++++-----
 swh/loader/package/npm/loader.py                 |  2 +-
 swh/loader/package/npm/tests/test_npm.py         | 11 ++++++
 swh/loader/package/pypi/loader.py                |  2 +-
 swh/loader/package/pypi/tests/test_pypi.py       | 11 ++++++
 swh/loader/package/tests/test_utils.py           |  5 ++-
 swh/loader/package/utils.py                      |  7 ++--
 12 files changed, 133 insertions(+), 50 deletions(-)
 create mode 100644 swh/loader/exception.py
Changes applied before test
commit 1756bc0ba8ee2e9c08e172f11b7fc87a082c04ff
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 17:43:24 2021 +0100

    package: Mark visit as not_found when relevant
    
    Related to T3030

commit fb7ee4332b0dc7889d950309dec9a4ecf57453b5
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 12:59:29 2021 +0100

    core: Mark visit status as failed when relevant
    
    When a visit ends up with no snapshot within the exception block, mark the visit as
    failed. Otherwise, let it be a partial visit.
    
    Related to T3030

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/359/ for more details.

swh/loader/package/nixguix/loader.py
61–83

why this change?

swh/loader/package/nixguix/tests/test_nixguix.py
109–136

what's going on here? What do the mocks do, exactly?

swh/loader/package/nixguix/loader.py
61–83

(I'll take the question on the overall block change)

(The said change is explained in the diff description, but here goes with more details)

First, side-effect in constructor (http interaction here) is not a good idea. It complexifies maintenance, readability, etc...

Second, in the current state of affairs, if any failure happens, nothing is recorded in the archive.
The (celery) task fails prior to any archive interaction (since the instantiation fails due to the side-effect of failing the http interaction).

Now with this, the http interaction is delayed after the constructor so it is now recorded in the archive.

And third, it's now unified with how npm and the pypi loader does it.

swh/loader/package/nixguix/tests/test_nixguix.py
109–136

As before, the response returned a not json file which fails the json decode step.
Thus triggering a JSONDecodeError caught by the main loop which ends up in the failed visit.

That may be worth adding a docstring to explicit it.

swh/loader/package/nixguix/tests/test_nixguix.py
109–136

To clarify but that's the behavior for all tests really.

What do the mocks do, exactly?

requests_mock_datadir will intercept the requests to sources_url.
Returns a 200 with the requested content of the query, here file.txt.
As it's not a json file, that will fail do decode as explained in the prior comment.

111

What do the mocks do, exactly?

Here the sources url is not found, so requests_mock_datadir returns a 404.
Resulting in a NotFound within the base code from this diff.

Which results in the task with status failed and a visit_status with status "not_found".

  • Explicit one docstring about the NotFound exception
  • Add docstring to new tests
  • Rework commit message to even better explain the commit than before

Build has FAILED

Patch application report for D5035 (id=18040)

Could not rebase; Attempt merge onto 4500063a2f...

Updating 4500063..a0c55d1
Fast-forward
 swh/loader/core/loader.py                        |  8 ++--
 swh/loader/core/tests/test_loader.py             | 31 +++++++++++-----
 swh/loader/exception.py                          | 13 +++++++
 swh/loader/package/loader.py                     | 18 ++++++++-
 swh/loader/package/nixguix/loader.py             | 47 ++++++++++++++----------
 swh/loader/package/nixguix/tests/test_nixguix.py | 37 ++++++++++++++-----
 swh/loader/package/npm/loader.py                 |  2 +-
 swh/loader/package/npm/tests/test_npm.py         | 11 ++++++
 swh/loader/package/pypi/loader.py                |  2 +-
 swh/loader/package/pypi/tests/test_pypi.py       | 11 ++++++
 swh/loader/package/tests/test_utils.py           |  5 ++-
 swh/loader/package/utils.py                      |  7 ++--
 12 files changed, 142 insertions(+), 50 deletions(-)
 create mode 100644 swh/loader/exception.py
Changes applied before test
commit a0c55d15400bc0348aa8f859e44a4b644c950317
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 17:43:24 2021 +0100

    package: Mark visit as not_found when relevant
    
    If any errors happen during the communication with the origin to retrieve package
    information, the visit fails and its status is marked as not_found.
    
    Only pypi, npm and nixguix package loaders are impacted. The other loaders do not read
    anything from their url so there is no way to trigger such possibility.
    
    Note that the nixguix loader got refactored to avoid the side-effect of reading data out
    of the url within the constructor. It was necessary so the check fails and the visit
    status is dealt with as described. Also, this unifies it with how pypi and npm loaders
    deals with the url communication.
    
    Related to T3030

commit fb7ee4332b0dc7889d950309dec9a4ecf57453b5
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 12:59:29 2021 +0100

    core: Mark visit status as failed when relevant
    
    When a visit ends up with no snapshot within the exception block, mark the visit as
    failed. Otherwise, let it be a partial visit.
    
    Related to T3030

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/364/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/364/console

Fix typo (and test) when renaming source variable in test

Build is green

Patch application report for D5035 (id=18042)

Could not rebase; Attempt merge onto 4500063a2f...

Updating 4500063..96b949c
Fast-forward
 swh/loader/core/loader.py                        |  8 ++--
 swh/loader/core/tests/test_loader.py             | 31 +++++++++++-----
 swh/loader/exception.py                          | 13 +++++++
 swh/loader/package/loader.py                     | 18 ++++++++-
 swh/loader/package/nixguix/loader.py             | 47 ++++++++++++++----------
 swh/loader/package/nixguix/tests/test_nixguix.py | 37 ++++++++++++++-----
 swh/loader/package/npm/loader.py                 |  2 +-
 swh/loader/package/npm/tests/test_npm.py         | 11 ++++++
 swh/loader/package/pypi/loader.py                |  2 +-
 swh/loader/package/pypi/tests/test_pypi.py       | 11 ++++++
 swh/loader/package/tests/test_utils.py           |  5 ++-
 swh/loader/package/utils.py                      |  7 ++--
 12 files changed, 142 insertions(+), 50 deletions(-)
 create mode 100644 swh/loader/exception.py
Changes applied before test
commit 96b949cf5121948dbd4aefa1657b25264e8a5202
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 17:43:24 2021 +0100

    package: Mark visit as not_found when relevant
    
    If any errors happen during the communication with the origin to retrieve package
    information, the visit fails and its status is marked as not_found.
    
    Only pypi, npm and nixguix package loaders are impacted. The other loaders do not read
    anything from their url so there is no way to trigger such possibility.
    
    Note that the nixguix loader got refactored to avoid the side-effect of reading data out
    of the url within the constructor. It was necessary so the check fails and the visit
    status is dealt with as described. Also, this unifies it with how pypi and npm loaders
    deals with the url communication.
    
    Related to T3030

commit fb7ee4332b0dc7889d950309dec9a4ecf57453b5
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 12:59:29 2021 +0100

    core: Mark visit status as failed when relevant
    
    When a visit ends up with no snapshot within the exception block, mark the visit as
    failed. Otherwise, let it be a partial visit.
    
    Related to T3030

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/366/ for more details.

douardda added a subscriber: douardda.

lgtm but I would wait for a green stamp from @vlorentz as well

This revision is now accepted and ready to land.Feb 11 2021, 10:10 AM
swh/loader/package/nixguix/tests/test_nixguix.py
109–136

could you add this in comments, so one doesn't need to jump to the datadir to understand what happens?

swh/loader/package/nixguix/tests/test_nixguix.py
109–136

yes, i'll do it in another diff.