Page MenuHomeSoftware Heritage

cgit: Don't stop the listing when a repository page is not available
ClosedPublic

Authored by vsellier on Jan 27 2021, 12:42 PM.

Diff Detail

Repository
rDLS Listers
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 18811
Build 29138: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 29137: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D4954 (id=17655)

Could not rebase; Attempt merge onto bb0184c004...

Updating bb0184c..81b45e3
Fast-forward
 swh/lister/cgit/lister.py            |  57 ++++++++++++++++---
 swh/lister/cgit/tests/test_lister.py | 104 +++++++++++++++++++++++++++++++++--
 2 files changed, 148 insertions(+), 13 deletions(-)
Changes applied before test
commit 81b45e3c15b11f74efc8c05e6af46a2bd72e1831
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Wed Jan 27 12:22:47 2021 +0100

    cgit: Don't stop the listing when a repository page is not available
    
    Related to T2988

commit f1602ab87217f105e3245dc5058c281cb39a1b82
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Tue Jan 26 18:03:27 2021 +0100

    cgit: Add support for last_update information during listing
    
    Related to T2988

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/193/ for more details.

Build is green

Patch application report for D4954 (id=17668)

Could not rebase; Attempt merge onto bb0184c004...

Updating bb0184c..c2d4d30
Fast-forward
 swh/lister/cgit/lister.py            |  53 +++++++++++++++---
 swh/lister/cgit/tests/test_lister.py | 104 +++++++++++++++++++++++++++++++++--
 2 files changed, 144 insertions(+), 13 deletions(-)
Changes applied before test
commit c2d4d30e8af83b91a289837e2daaab55bb4244ab
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Wed Jan 27 12:22:47 2021 +0100

    cgit: Don't stop the listing when a repository page is not available
    
    Related to T2988

commit 743bff42190060c7522b8d00ead1f2bfcd10f5f6
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Tue Jan 26 18:03:27 2021 +0100

    cgit: Add support for last_update information during listing
    
    Related to T2988

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/195/ for more details.

anlambert added a subscriber: anlambert.

You should rather catch the requests.exceptions.HTTPError originally raised by _get_and_parse method.

Simply modifying the get_origins_from_page method like this should do the trick.

def get_origins_from_page(
    self, repositories: Repositories
) -> Iterator[ListedOrigin]:
    """Convert a page of cgit repositories into a list of ListedOrigins."""
    assert self.lister_obj.id is not None

    for repository in repositories:
        try:
            origin_url = self._get_origin_from_repository_url(repository["url"])
        except HTTPError as e:
            logger.warning(
                "Unexpected HTTP status code %s on %s",
                e.response.status_code,
                e.response.url,
            )
            origin_url = None

        if not origin_url:
            continue

        yield ListedOrigin(
            lister_id=self.lister_obj.id,
            url=origin_url,
            visit_type="git",
            last_update=_parse_last_updated_date(repository),
        )
This revision now requires changes to proceed.Jan 27 2021, 2:24 PM

Build is green

Patch application report for D4954 (id=17670)

Could not rebase; Attempt merge onto bb0184c004...

Updating bb0184c..aa7688f
Fast-forward
 swh/lister/cgit/lister.py            |  58 ++++++++++++++++---
 swh/lister/cgit/tests/test_lister.py | 104 +++++++++++++++++++++++++++++++++--
 2 files changed, 149 insertions(+), 13 deletions(-)
Changes applied before test
commit aa7688f5be5c9efbe78dc2419e884696736d1dcf
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Wed Jan 27 12:22:47 2021 +0100

    cgit: Don't stop the listing when a repository page is not available
    
    Related to T2988

commit 91fcde83410d41daf0dbe4f431b55c33c69f8544
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Tue Jan 26 18:03:27 2021 +0100

    cgit: Add support for last_update information during listing
    
    Related to T2988

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/197/ for more details.

You should rather catch the requests.exceptions.HTTPError originally raised by _get_and_parse method.

picture me curious, why?

(it's missing the eventual logging.warning instruction but its use is debatable as well ;)

You should rather catch the requests.exceptions.HTTPError originally raised by _get_and_parse method.

picture me curious, why?

(it's missing the eventual logging.warning instruction but its use is debatable as well ;)

Less code to modify, easier to read and understand the intended behavior from my point of view.

Less code to modify and easier to read and understand the intended behavior from my point of view.

maybe so.

One pro I see for your implementation is if at some point the main loop (get_pages)
fails to fetch something (which we don't currently expect to see the assert added
here), we'll have more information available in the log instead of an AssertionError.
We'll indeed have an actual HttpError with the status code and everything in the
stacktrace.

All in all, i think you are right ;)

Use an exception to validate a repo page can be accessed

Less code to modify and easier to read and understand the intended behavior from my point of view.

maybe so.

One pro I see for your implementation is if at some point the main loop (get_pages)
fails to fetch something (which we don't currently expect to see the assert added
here), we'll have more information available in the log instead of an AssertionError.
We'll indeed have an actual HttpError with the status code and everything in the
stacktrace.

All in all, i think you are right ;)

Yes, it's better to have a method that simply performs an HTTP request and raise if errors.
HTTP errors can then be handled if needed in other methods and if not the lister
will end up with error with the stack trace displayed.

Build is green

Patch application report for D4954 (id=17672)

Rebasing onto 91fcde8341...

Current branch diff-target is up to date.
Changes applied before test
commit f6f9f1ca28a9161a2c3606243e202285030f7db6
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Wed Jan 27 12:22:47 2021 +0100

    cgit: Don't stop the listing when a repository page is not available
    
    Related to T2988

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/198/ for more details.

This revision is now accepted and ready to land.Jan 27 2021, 3:03 PM