Page MenuHomeSoftware Heritage

sourceforge: don't abort on error for project
ClosedPublic

Authored by Alphare on May 12 2021, 11:11 AM.

Details

Summary

It's suboptimal to say the least to stop the entire lister process
if a single project page is somehow broken (404, most likely). This
change logs the issue as a warning and carries on, as well as some
minor logging changes and comments touch ups.

Diff Detail

Repository
rDLS Listers
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D5740 (id=20526)

Rebasing onto 2ff549e125...

Current branch diff-target is up to date.
Changes applied before test
commit cc8a23e887d3f58673e07a88ea6595666d351e1b
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Tue May 11 10:03:04 2021 +0200

    sourceforge: don't abort on error for project
    
    It's suboptimal to say the least to stop the entire lister process
    if a single project page is somehow broken (404, most likely). This
    change logs the issue as a warning and carries on, as well as some
    minor logging changes and comments touch ups.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/282/ for more details.

When this happens, will the page be visited again by the incremental lister?

ardumont added inline comments.
swh/lister/sourceforge/tests/test_lister.py
371

It'd be clearer if the lister actually listed something here after having encountered issues (at any point in time).
(I don't know, given the existing code, if it's actually difficult to modify the test to do this ^)

In that test currently, I'm surprised that there is actually nothing listed (and don't see how to differentiate with the early code which broke the listing when any error got encountered).

What do you think?

More thorough testing

Test that listing continues after encountering a non-OK page.

When this happens, will the page be visited again by the incremental lister?

Yes. This is probably a good idea since non-200 are few and far between and we don't want to start skipping a project because SourceForge happens to have trouble serving it at any given point.
What do you think?

swh/lister/sourceforge/tests/test_lister.py
371

Agreed, it's better to make sure that nothing breaks during the listing process. I've updated the code to reflect this.

Build is green

Patch application report for D5740 (id=20527)

Rebasing onto 2ff549e125...

Current branch diff-target is up to date.
Changes applied before test
commit 8f3bbacd5eee8933e817e2de1fa2019ca4a3b3c0
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Tue May 11 10:03:04 2021 +0200

    sourceforge: don't abort on error for project
    
    It's suboptimal to say the least to stop the entire lister process
    if a single project page is somehow broken (404, most likely). This
    change logs the issue as a warning and carries on, as well as some
    minor logging changes and comments touch ups.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/283/ for more details.

When this happens, will the page be visited again by the incremental lister?

Yes. This is probably a good idea since non-200 are few and far between and we don't want to start skipping a project because SourceForge happens to have trouble serving it at any given point.
What do you think?

yes, sounds fair enough.

This revision is now accepted and ready to land.May 12 2021, 7:05 PM