Page MenuHomeSoftware Heritage

core.loader: Allow vcs loaders to deal with not_found status
ClosedPublic

Authored by ardumont on Feb 8 2021, 5:02 PM.

Details

Summary

This will allow git, mercurial or svn loader implementations to mark the visit status as
not_found when such event occurs (e.g failing to to remotely communicate with the remote
server, archive not found, etc...).

Technically, this opens a means to trap a NotFound exception from the main loop. Which
finalize the visit and mark its status as "not_found".

Depends on D5035
Related to T3030

Diff Detail

Repository
rDLDBASE Generic VCS/Package Loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D5039 (id=17975)

Could not rebase; Attempt merge onto 4500063a2f...

Updating 4500063..30ed737
Fast-forward
 swh/loader/core/loader.py                        | 24 ++++++++--
 swh/loader/core/tests/test_loader.py             | 56 +++++++++++++++++++-----
 swh/loader/exception.py                          | 13 ++++++
 swh/loader/package/loader.py                     | 14 +++++-
 swh/loader/package/nixguix/loader.py             | 47 +++++++++++---------
 swh/loader/package/nixguix/tests/test_nixguix.py | 32 ++++++++++----
 swh/loader/package/npm/loader.py                 |  2 +-
 swh/loader/package/npm/tests/test_npm.py         | 11 +++++
 swh/loader/package/pypi/loader.py                |  2 +-
 swh/loader/package/pypi/tests/test_pypi.py       | 11 +++++
 swh/loader/package/tests/test_utils.py           |  5 ++-
 swh/loader/package/utils.py                      |  7 +--
 12 files changed, 174 insertions(+), 50 deletions(-)
 create mode 100644 swh/loader/exception.py
Changes applied before test
commit 30ed7379b512f141ced87dc0ec3ecf94e3dc47ae
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Feb 8 17:00:54 2021 +0100

    core.loader: Allow vcs loaders to deal with not_found status
    
    Related to T3030

commit 1756bc0ba8ee2e9c08e172f11b7fc87a082c04ff
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 17:43:24 2021 +0100

    package: Mark visit as not_found when relevant
    
    Related to T3030

commit fb7ee4332b0dc7889d950309dec9a4ecf57453b5
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 12:59:29 2021 +0100

    core: Mark visit status as failed when relevant
    
    When a visit ends up with no snapshot within the exception block, mark the visit as
    failed. Otherwise, let it be a partial visit.
    
    Related to T3030

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/360/ for more details.

Mention in the prepare method's docstring the possibility of raising the NotFound
exception.

Build is green

Patch application report for D5039 (id=17976)

Could not rebase; Attempt merge onto 4500063a2f...

Updating 4500063..30ed737
Fast-forward
 swh/loader/core/loader.py                        | 24 ++++++++--
 swh/loader/core/tests/test_loader.py             | 56 +++++++++++++++++++-----
 swh/loader/exception.py                          | 13 ++++++
 swh/loader/package/loader.py                     | 14 +++++-
 swh/loader/package/nixguix/loader.py             | 47 +++++++++++---------
 swh/loader/package/nixguix/tests/test_nixguix.py | 32 ++++++++++----
 swh/loader/package/npm/loader.py                 |  2 +-
 swh/loader/package/npm/tests/test_npm.py         | 11 +++++
 swh/loader/package/pypi/loader.py                |  2 +-
 swh/loader/package/pypi/tests/test_pypi.py       | 11 +++++
 swh/loader/package/tests/test_utils.py           |  5 ++-
 swh/loader/package/utils.py                      |  7 +--
 12 files changed, 174 insertions(+), 50 deletions(-)
 create mode 100644 swh/loader/exception.py
Changes applied before test
commit 30ed7379b512f141ced87dc0ec3ecf94e3dc47ae
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Feb 8 17:00:54 2021 +0100

    core.loader: Allow vcs loaders to deal with not_found status
    
    Related to T3030

commit 1756bc0ba8ee2e9c08e172f11b7fc87a082c04ff
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 17:43:24 2021 +0100

    package: Mark visit as not_found when relevant
    
    Related to T3030

commit fb7ee4332b0dc7889d950309dec9a4ecf57453b5
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 12:59:29 2021 +0100

    core: Mark visit status as failed when relevant
    
    When a visit ends up with no snapshot within the exception block, mark the visit as
    failed. Otherwise, let it be a partial visit.
    
    Related to T3030

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/361/ for more details.

Actually, really mention in the prepare method's docstring the possibility of raising
the NotFound exception.

Build is green

Patch application report for D5039 (id=17979)

Could not rebase; Attempt merge onto 4500063a2f...

Updating 4500063..ed1fec6
Fast-forward
 swh/loader/core/loader.py                        | 27 ++++++++++--
 swh/loader/core/tests/test_loader.py             | 56 +++++++++++++++++++-----
 swh/loader/exception.py                          | 13 ++++++
 swh/loader/package/loader.py                     | 14 +++++-
 swh/loader/package/nixguix/loader.py             | 47 +++++++++++---------
 swh/loader/package/nixguix/tests/test_nixguix.py | 32 ++++++++++----
 swh/loader/package/npm/loader.py                 |  2 +-
 swh/loader/package/npm/tests/test_npm.py         | 11 +++++
 swh/loader/package/pypi/loader.py                |  2 +-
 swh/loader/package/pypi/tests/test_pypi.py       | 11 +++++
 swh/loader/package/tests/test_utils.py           |  5 ++-
 swh/loader/package/utils.py                      |  7 +--
 12 files changed, 177 insertions(+), 50 deletions(-)
 create mode 100644 swh/loader/exception.py
Changes applied before test
commit ed1fec6a464a805828f78b5eb98bde6304f2b400
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Feb 8 17:00:54 2021 +0100

    core.loader: Allow vcs loaders to deal with not_found status
    
    Related to T3030

commit 1756bc0ba8ee2e9c08e172f11b7fc87a082c04ff
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 17:43:24 2021 +0100

    package: Mark visit as not_found when relevant
    
    Related to T3030

commit fb7ee4332b0dc7889d950309dec9a4ecf57453b5
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 12:59:29 2021 +0100

    core: Mark visit status as failed when relevant
    
    When a visit ends up with no snapshot within the exception block, mark the visit as
    failed. Otherwise, let it be a partial visit.
    
    Related to T3030

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/362/ for more details.

anlambert added inline comments.
swh/loader/core/loader.py
351

I think you should return {"status": "failed"} after that line. or maybe I am missing something here ?

swh/loader/core/tests/test_loader.py
227–230

You can remove that if you return the failed status after catching the NotFound exception.

swh/loader/core/loader.py
337–369

These two exception code blocks are quite similar, you could merge them into one and test on the exception type to modify status and log message.

swh/loader/core/loader.py
337–369

it crossed my mind.

I did not act on it as i'm always wondering whether that's within the scope of that diff or with another one about refactoring ¯\_(ツ)_/¯

351

I'm not totally sure about this. That status is about the task and not the visit.
Currently, it's rendering status "uneventful" which is also true.

The thing is, i think with status "failed", the scheduler will schedule it back and
decrement the retries_left counter?

swh/loader/core/loader.py
337–369

I think yo can do it in that diff, you just have to modify the original Exception block and change statuses if the exception type is NotFound,

351

Ah right, I got mistaken by the test that overrides load_status.

swh/loader/core/tests/test_loader.py
227–230

Yes, i mislead myself here ¯\_(ツ)_/¯
(and thus you as well... sorry)

It should have been uneventful to be consistent with the actual runtime behavior.

Adapt according to review:

  • Fix misleading test
  • Merge the try: catch (Exception, NotFound) instructions.

Thanks!

Build is green

Patch application report for D5039 (id=17982)

Could not rebase; Attempt merge onto 4500063a2f...

Updating 4500063..3abaf74
Fast-forward
 swh/loader/core/loader.py                        | 22 +++++++---
 swh/loader/core/tests/test_loader.py             | 56 +++++++++++++++++++-----
 swh/loader/exception.py                          | 13 ++++++
 swh/loader/package/loader.py                     | 14 +++++-
 swh/loader/package/nixguix/loader.py             | 47 +++++++++++---------
 swh/loader/package/nixguix/tests/test_nixguix.py | 32 ++++++++++----
 swh/loader/package/npm/loader.py                 |  2 +-
 swh/loader/package/npm/tests/test_npm.py         | 11 +++++
 swh/loader/package/pypi/loader.py                |  2 +-
 swh/loader/package/pypi/tests/test_pypi.py       | 11 +++++
 swh/loader/package/tests/test_utils.py           |  5 ++-
 swh/loader/package/utils.py                      |  7 +--
 12 files changed, 170 insertions(+), 52 deletions(-)
 create mode 100644 swh/loader/exception.py
Changes applied before test
commit 3abaf7443724f8c45d02281e9937920dc5861cf7
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Feb 8 17:00:54 2021 +0100

    core.loader: Allow vcs loaders to deal with not_found status
    
    Related to T3030

commit 1756bc0ba8ee2e9c08e172f11b7fc87a082c04ff
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 17:43:24 2021 +0100

    package: Mark visit as not_found when relevant
    
    Related to T3030

commit fb7ee4332b0dc7889d950309dec9a4ecf57453b5
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 12:59:29 2021 +0100

    core: Mark visit status as failed when relevant
    
    When a visit ends up with no snapshot within the exception block, mark the visit as
    failed. Otherwise, let it be a partial visit.
    
    Related to T3030

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/363/ for more details.

anlambert added inline comments.
swh/loader/core/tests/test_loader.py
227–230

Is the load_status override really needed here ? This should already return the uneventful status if I am not wrong.

This revision is now accepted and ready to land.Feb 8 2021, 6:22 PM
swh/loader/core/tests/test_loader.py
227–230

unsure, i think i added because otherwise, it'd be "eventful" (default implem ¯\_(ツ)_/¯).
I'll check ;)

Rebase and improve again the diff description

Build has FAILED

Patch application report for D5039 (id=18041)

Could not rebase; Attempt merge onto 4500063a2f...

Updating 4500063..db02dfd
Fast-forward
 swh/loader/core/loader.py                        | 22 +++++++---
 swh/loader/core/tests/test_loader.py             | 56 +++++++++++++++++++-----
 swh/loader/exception.py                          | 13 ++++++
 swh/loader/package/loader.py                     | 18 +++++++-
 swh/loader/package/nixguix/loader.py             | 47 +++++++++++---------
 swh/loader/package/nixguix/tests/test_nixguix.py | 37 ++++++++++++----
 swh/loader/package/npm/loader.py                 |  2 +-
 swh/loader/package/npm/tests/test_npm.py         | 11 +++++
 swh/loader/package/pypi/loader.py                |  2 +-
 swh/loader/package/pypi/tests/test_pypi.py       | 11 +++++
 swh/loader/package/tests/test_utils.py           |  5 ++-
 swh/loader/package/utils.py                      |  7 +--
 12 files changed, 179 insertions(+), 52 deletions(-)
 create mode 100644 swh/loader/exception.py
Changes applied before test
commit db02dfd32e1436ee3156eb3abbc6b99e461a7c1b
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Feb 8 17:00:54 2021 +0100

    core.loader: Allow vcs loaders to deal with not_found status
    
    Related to T3030

commit a0c55d15400bc0348aa8f859e44a4b644c950317
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 17:43:24 2021 +0100

    package: Mark visit as not_found when relevant
    
    If any errors happen during the communication with the origin to retrieve package
    information, the visit fails and its status is marked as not_found.
    
    Only pypi, npm and nixguix package loaders are impacted. The other loaders do not read
    anything from their url so there is no way to trigger such possibility.
    
    Note that the nixguix loader got refactored to avoid the side-effect of reading data out
    of the url within the constructor. It was necessary so the check fails and the visit
    status is dealt with as described. Also, this unifies it with how pypi and npm loaders
    deals with the url communication.
    
    Related to T3030

commit fb7ee4332b0dc7889d950309dec9a4ecf57453b5
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 12:59:29 2021 +0100

    core: Mark visit status as failed when relevant
    
    When a visit ends up with no snapshot within the exception block, mark the visit as
    failed. Otherwise, let it be a partial visit.
    
    Related to T3030

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/365/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/365/console

Build is green

Patch application report for D5039 (id=18043)

Could not rebase; Attempt merge onto 4500063a2f...

Updating 4500063..945ff24
Fast-forward
 swh/loader/core/loader.py                        | 22 +++++++---
 swh/loader/core/tests/test_loader.py             | 56 +++++++++++++++++++-----
 swh/loader/exception.py                          | 13 ++++++
 swh/loader/package/loader.py                     | 18 +++++++-
 swh/loader/package/nixguix/loader.py             | 47 +++++++++++---------
 swh/loader/package/nixguix/tests/test_nixguix.py | 37 ++++++++++++----
 swh/loader/package/npm/loader.py                 |  2 +-
 swh/loader/package/npm/tests/test_npm.py         | 11 +++++
 swh/loader/package/pypi/loader.py                |  2 +-
 swh/loader/package/pypi/tests/test_pypi.py       | 11 +++++
 swh/loader/package/tests/test_utils.py           |  5 ++-
 swh/loader/package/utils.py                      |  7 +--
 12 files changed, 179 insertions(+), 52 deletions(-)
 create mode 100644 swh/loader/exception.py
Changes applied before test
commit 945ff242b5b7d038060f9382c682f2ed9c8f3b8a
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Feb 8 17:00:54 2021 +0100

    core.loader: Allow vcs loaders to deal with not_found status
    
    This will allow git, mercurial or svn loader implementations to mark the visit status as
    not_found when such event occurs (e.g failing to to remotely communicate with the remote
    server, archive not found, etc...).
    
    Technically, this opens a means to trap a NotFound exception from the main loop. Which
    finalize the visit and mark its status as "not_found".
    
    Related to T3030

commit 96b949cf5121948dbd4aefa1657b25264e8a5202
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 17:43:24 2021 +0100

    package: Mark visit as not_found when relevant
    
    If any errors happen during the communication with the origin to retrieve package
    information, the visit fails and its status is marked as not_found.
    
    Only pypi, npm and nixguix package loaders are impacted. The other loaders do not read
    anything from their url so there is no way to trigger such possibility.
    
    Note that the nixguix loader got refactored to avoid the side-effect of reading data out
    of the url within the constructor. It was necessary so the check fails and the visit
    status is dealt with as described. Also, this unifies it with how pypi and npm loaders
    deals with the url communication.
    
    Related to T3030

commit fb7ee4332b0dc7889d950309dec9a4ecf57453b5
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Feb 5 12:59:29 2021 +0100

    core: Mark visit status as failed when relevant
    
    When a visit ends up with no snapshot within the exception block, mark the visit as
    failed. Otherwise, let it be a partial visit.
    
    Related to T3030

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/367/ for more details.