Page MenuHomeSoftware Heritage

lauchpad: Manage unhandled exception when listing
ClosedPublic

Authored by ardumont on Feb 17 2022, 9:57 AM.

Details

Summary

Prior to this commit, the listing could fail when either reading a page or the page of
results (lauchpad api raises RestfulError). This now retries when those kind of
exceptions happen. If the error persists (after multiple tryouts and exponential
backoff), the listing continues nonetheless (with warning logs).

Note that if the page ends up being empty, it's no longer accounted for.

This actually allows the listing to finish in case of issues.

Related to T3945
Depends on D7194

Test Plan

tox + docker

With only the existing code (or D7193), the actual listing could stop with an issue:

swh-lister_1                        | [2022-02-16 18:19:40,656: ERROR/ForkPoolWorker-1] Task swh.lister.launchpad.tasks.FullLaunchpadLister[f3e3f3aa-8f4a-4e2c-8821-facd4952e53e] raised unexpected: RestfulError()
swh-lister_1                        | Traceback (most recent call last):
swh-lister_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 450, in trace_task
swh-lister_1                        |     R = retval = fun(*args, **kwargs)
swh-lister_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/scheduler/task.py", line 55, in __call__
swh-lister_1                        |     result = super().__call__(*args, **kwargs)
swh-lister_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 731, in __protected_call__
swh-lister_1                        |     return self.run(*args, **kwargs)
swh-lister_1                        |   File "/src/swh-lister/swh/lister/launchpad/tasks.py", line 20, in list_launchpad_full
swh-lister_1                        |     return lister.run().dict()
swh-lister_1                        |   File "/src/swh-lister/swh/lister/pattern.py", line 130, in run
swh-lister_1                        |     full_stats.origins += self.send_origins(origins)
swh-lister_1                        |   File "/src/swh-lister/swh/lister/pattern.py", line 233, in send_origins
swh-lister_1                        |     for batch_origins in grouper(origins, n=1000):
swh-lister_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/utils.py", line 47, in grouper
swh-lister_1                        |     for _data in itertools.zip_longest(*args, fillvalue=stop_value):
swh-lister_1                        |   File "/src/swh-lister/swh/lister/launchpad/lister.py", line 123, in get_origins_from_page
swh-lister_1                        |     vcs_type, repos = page
swh-lister_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/lazr/restfulclient/resource.py", line 819, in __iter__
swh-lister_1                        |     next_get = self._root._browser.get(URI(next_link))
swh-lister_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/lazr/restfulclient/_browser.py", line 439, in get
swh-lister_1                        |     response, content = self._request(url, extra_headers=headers)
swh-lister_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/lazr/restfulclient/_browser.py", line 429, in _request
swh-lister_1                        |     raise error
swh-lister_1                        | lazr.restfulclient.errors.RestfulError

Now this finishes more properly:

swh-lister_1                        | [2022-02-17 08:49:49,585: INFO/MainProcess] Task swh.lister.launchpad.tasks.IncrementalLaunchpadLister[76a42302-abc8-4c77-9890-57173475cfad] received
swh-lister_1                        | [2022-02-17 08:51:12,284: ERROR/ForkPoolWorker-1] Listing bzr origins raised HTTP Error 503: Service Unavailable
swh-lister_1                        | Response headers:
swh-lister_1                        | ---
swh-lister_1                        | -content-encoding: gzip
swh-lister_1                        | connection: close
swh-lister_1                        | content-length: 8837
swh-lister_1                        | content-type: text/html;charset=utf-8
swh-lister_1                        | date: Thu, 17 Feb 2022 08:51:12 GMT
swh-lister_1                        | retry-after: 900
swh-lister_1                        | server: gunicorn/19.8.1
swh-lister_1                        | status: 503
swh-lister_1                        | vary: Accept-Encoding
swh-lister_1                        | x-lazr-oopsid: OOPS-b6047b73198184adbafe5cf3660f596a
swh-lister_1                        | x-powered-by: Zope (www.zope.org), Python (www.python.org)
swh-lister_1                        | x-request-id: 8d2d648c-dcd8-4fa9-ae64-aaa6f01c05a9
swh-lister_1                        | x-vcs-revision: 131c1c72b6032652fb002ebff08e63a8deeb8d0a
swh-lister_1                        | ---
swh-lister_1                        | Response body:
swh-lister_1                        | ---
swh-lister_1                        | b'<!DOCTYPE html>\n<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr">\n  <head>\n    <meta charset="UTF-8" />\n    <title>Error: Timeout</title>\n    <link rel="shortcut icon" href="/@@/launchpad.png" />\n    \n    \n    \n\n    \n  \n  <link type="text/css" rel="stylesheet" media="screen, print" href="/+icing/rev131c1c72b6032652fb002ebff08e63a8deeb8d0a/combo.css" />\n\n\n    \n\n    \n      \n      \n    \n\n    \n    \n\n    \n\n    \n  \n\n  \n  \n  <script type="text/javascript">\n    var LP = {\n        cache: {},\n        links: {}\n    };\n  </script>\n\n  \n\n  <script type="text/javascript">var cookie_scope = \'; Path=/; Secure; Domain=.launchpad.net\';</script>\n\n   <script type="text/javascript" src="/+combo/rev131c1c72b6032652fb002ebff08e63a8deeb8d0a/?yui/yui/yui-min.js&amp;lp/meta.js&amp;yui/loader/loader-min.js"></script>\n   <script type="text/javascript">\n        var raw = null;\n        if (LP.devmode) {\n           raw = \'raw\';\n        }\n        YUI.GlobalConfig = {\n            combine: true,\n            comboBase: \'/+combo/rev131c1c72b6032652fb002ebff08e63a8deeb8d0a/?\',\n            root: \'yui/\',\n            filter: raw,\n            debug: false,\n            fetchCSS: false,\n            maxURLLength: 2000,\n            groups: {\n                lp: {\n                    combine: true,\n                    base: \'/+combo/rev131c1c72b6032652fb002ebff08e63a8deeb8d0a/?lp/\',\n                    comboBase: \'/+combo/rev131c1c72b6032652fb002ebff08e63a8deeb8d0a/?\',\n                    root: \'lp/\',\n                    // comes from including lp/meta.js\n                    modules: LP_MODULES,\n                    fetchCSS: false\n                }\n            }\n        }</script>\n\n  <script type="text/javascript">\n      // we need this to create a single YUI instance all events and code\n      // talks across. All instances of YUI().use should be based off of\n      // LPJS instead.\n      var LPJS = new YUI();\n  </script>\n\n\n\n    <script id="base-layout-load-scripts" type="text/javascript">\n        //<![CDATA[\n        LPJS.use(\'base\', \'node\', \'console\', \'event\',\n            \'oop\', \'lp\', \'lp.app.foldables\',\'lp.app.sorttable\',\n            \'lp.app.inlinehelp\', \'lp.app.links\',\n            \'lp.bugs.bugtask_index\', \'lp.bugs.subscribers\',\n            \'lp.app.ellipsis\', \'lp.code.branchmergeproposal.diff\',\n            \'lp.views.global\',\n             function(Y) {\n\n            Y.on("domready", function () {\n                var global_view = new Y.lp.views.Global();\n                global_view.render();\n\n                Y.lp.app.sorttable.SortTable.init();\n                Y.lp.app.inlinehelp.init_help();\n                Y.lp.activate_collapsibles();\n                Y.lp.app.foldables.activate();\n                Y.lp.app.links.check_valid_lp_links();\n            });\n\n            Y.on(\'lp:context:web_link:changed\', function(e) {\n                  window.location = e.new_value;\n            });\n        });\n        //]]>\n    </script>\n    <script id="base-helper-functions" type="text/javascript">\n         //<![CDATA[\n        // This code is pulled from lp.js that needs to be available on every\n        // request. Pulling here to get it outside the scope of the YUI block.\n        function setFocusByName(name) {\n            // Focus the first element matching the given name which can be focused.\n            var nodes = document.getElementsByName(name);\n            var i, node;\n            for (i = 0; i < nodes.length; i++) {\n                node = nodes[i];\n                if (node.focus) {\n                    try {\n                        // Trying to focus a hidden element throws an error in IE8.\n                        if (node.offsetHeight !== 0) {\n                            node.focus();\n                        }\n                    } catch (e) {\n                        LPJS.use(\'console\', function(Y) {\n                            Y.log(\'In setFocusByName(<\' +\n                                node.tagName + \' type=\' + node.type + \'>): \' + e);\n                        });\n                    }\n                    break;\n                }\n            }\n        }\n\n        function selectWidget(widget_name, event) {\n          if (event && (event.keyCode === 9 || event.keyCode === 13)) {\n              // Avoid firing if user is tabbing through or simply pressing\n              // enter to submit the form.\n              return;\n          }\n          document.getElementById(widget_name).checked = true;\n        }\n        //]]>\n    </script>\n\n    \n      \n    \n  </head>\n\n  <body id="document" itemscope="" itemtype="http://schema.org/WebPage" class="tab-unknown\n      main_only\n      public\n      yui3-skin-sam">\n          \n          \n    <div class="yui-d0">\n      <div id="locationbar" class="login-logout">\n        \n\n<div id="logincontrol"><a href="https://api.launchpad.net/devel/devel/branches/+login?modified_since_date=%222009-09-10T10%3A21%3A25%2B00%3A00%22&amp;order_by=most+neglected+first&amp;ws.op=getBranches">Log in / Register</a></div>\n\n\n\n      </div><!--id="locationbar"-->\n\n      <div id="watermark" class="watermark-apps-portlet">\n        <div>\n          <img alt="" width="64" height="64" src="/@@/launchpad-logo" />\n        </div>\n        <div class="wide">\n          <h2 id="watermark-heading"><span>Launchpad.net</span></h2>\n        </div>\n        \n  <!-- Application Menu -->\n  <ul class="facetmenu">\n  </ul>\n\n      </div>\n\n      \n        <div id="maincontent" class="yui-main">\n          <div class="yui-b" dir="ltr">\n            <div class="context-publication">\n              \n              \n\n              <div id="registration" class="registering">\n                \n              </div>\n            </div>\n\n            \n            <div id="request-notifications">\n              \n            </div>\n\n            \n              <div class="top-portlet">\n      <h1 class="exception">Timeout error</h1>\n      <p>\n        Sorry, something just went wrong in Launchpad.\n      </p>\n      <p>\n        We&#8217;ve recorded what happened,\n        and we&#8217;ll fix it as soon as possible.\n        Apologies for the inconvenience.\n      </p>\n      <p>\n        Trying again in a couple of minutes might work.\n      </p>\n      <p>\n        If you report this as a bug, please include the error ID below,\n        preferably by copying and pasting it rather than by taking a\n        screenshot.\n      </p>\n      <p>\n        (Error <abbr>ID</abbr>:\n        <code class="oopsid">OOPS-b6047b73198184adbafe5cf3660f596a</code>)\n      </p>\n      \n    </div>\n            \n            \n          </div><!-- yui-b -->\n        </div><!-- yui-main -->\n\n        \n          <!-- yui-b side -->\n        \n      <!-- yui-t4 -->\n\n      \n  <div id="footer" class="footer">\n    <div class="lp-arcana">\n        <div class="lp-branding">\n          <a href="https://launchpad.net/"><img src="/@@/launchpad-logo-and-name-hierarchy.png" alt="Launchpad" /></a>\n          &nbsp;&bull;&nbsp;\n          <a href="https://launchpad.net/+tour">Take the tour</a>\n          &nbsp;&bull;&nbsp;\n          <a href="https://help.launchpad.net/">Read the guide</a>\n          &nbsp;\n          <form id="globalsearch" method="get" accept-charset="UTF-8" action="https://launchpad.net/+search">\n            <input type="search" id="search-text" name="field.text" />\n            <input type="image" src="/@@/search" style="vertical-align:5%" alt="Search Launchpad" />\n          </form>\n        </div>\n        \n  \n\n    </div>\n\n    <div class="colophon">\n      &copy; 2004-2022\n      <a href="http://canonical.com/">Canonical&nbsp;Ltd.</a>\n      &nbsp;&bull;&nbsp;\n      <a href="https://launchpad.net/legal">Terms of use</a>\n      &nbsp;&bull;&nbsp;\n      <a href="https://www.ubuntu.com/legal/dataprivacy">Data privacy</a>\n      &nbsp;&bull;&nbsp;\n      <a href="/feedback">Contact Launchpad Support</a>\n      \n      &nbsp;&bull;&nbsp;\n      <a href="http://blog.launchpad.net/">Blog</a>\n      \n\t&nbsp;&bull;&nbsp;\n\t<a href="https://canonical.com/careers">Careers</a>\n      \n      &nbsp;&bull;&nbsp;\n      <a href="https://twitter.com/launchpadstatus">System status</a>\n      <span id="lp-version">\n      &nbsp;&bull;&nbsp;\n        r131c1c7\n        \n        \n        (<a href="https://dev.launchpad.net/">Get the code!</a>)\n      </span>\n    </div>\n  </div>\n\n    </div><!-- yui-d0-->\n\n    \n  \n  \n  <script id="json-cache-script">LP.cache = {"related_features": {}};</script>\n\n    \n  \n\n    \n  </body>\n\n\n  <!--\n    Facet name: unknown\n    Page type: main_only\n    Has global search: True\n    Has application tabs: True\n    Has side portlets: False\n\n    At least 2 queries/external actions issued in 0.05 seconds OOPS-b6047b73198184adbafe5cf3660f596a\n\n    Features: {\'js.yui_version\': None, \'app.maintenance_message\': None, \'baselayout.careers_link.disabled\': None, \'visible_render_time\': None}\n\n    r131c1c7\n\n    -->\n\n</html>\n\n'
swh-lister_1                        | ---
swh-lister_1                        |
swh-lister_1                        | [2022-02-17 08:51:12,295: INFO/ForkPoolWorker-1] Task swh.lister.launchpad.tasks.IncrementalLaunchpadLister[76a42302-abc8-4c77-9890-57173475cfad] succeeded in 82.70451728097396s: {'pages': 1, 'origins': 22}

Diff Detail

Event Timeline

Build is green

Patch application report for D7194 (id=26072)

Could not rebase; Attempt merge onto 31b4429ced...

Updating 31b4429..fdbdec3
Fast-forward
 swh/lister/launchpad/lister.py                     | 142 ++++++++++++++++-----
 .../tests/data/launchpad_bzr_response.json         | 126 ++++++++++++++++++
 swh/lister/launchpad/tests/test_lister.py          |  92 +++++++++----
 3 files changed, 301 insertions(+), 59 deletions(-)
 create mode 100644 swh/lister/launchpad/tests/data/launchpad_bzr_response.json
Changes applied before test
commit fdbdec3df65d42cfbc6c7c9956d3973bbabe08fe
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Feb 17 09:52:19 2022 +0100

    lauchpad: Manage unhandled exception when listing
    
    Prior to this commit, the listing could fail when reading a page of data in lauchpad.
    This does 2 things:
    - trap exception (with retry policy) when reading a page
    - filter out page with no result (hence the changes in tests)
    
    This actually allows the listing to finish in case of issues.

commit 262f9369c837e293f8389dd9f7a6a965c09f621e
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Feb 16 17:56:13 2022 +0100

    launchpad: Allow bzr origins listing
    
    Related to T3945

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/455/ for more details.

swh/lister/launchpad/lister.py
98

Although, i think that won't do anything just yet ^ (but maybe we can keep that for another diff?)
We cannot introspect the RestfulError thingy (it's an exception), well short of parsing the content...

ardumont edited the summary of this revision. (Show Details)
  • Drop retry policy as it's doing nothing yet (we'd need some more dev)
  • Add missing test case
ardumont added inline comments.
swh/lister/launchpad/lister.py
98

It was a @retry decorator thingy which is a noop for now. So i dropped it.
Hence, my comment is no longer relevant.

Build is green

Patch application report for D7194 (id=26074)

Could not rebase; Attempt merge onto 31b4429ced...

Updating 31b4429..ac55637
Fast-forward
 swh/lister/launchpad/lister.py                     | 138 ++++++++++++++++-----
 .../tests/data/launchpad_bzr_response.json         | 126 +++++++++++++++++++
 swh/lister/launchpad/tests/test_lister.py          | 132 ++++++++++++++++----
 3 files changed, 336 insertions(+), 60 deletions(-)
 create mode 100644 swh/lister/launchpad/tests/data/launchpad_bzr_response.json
Changes applied before test
commit ac55637e8228424c98bc551ec70c24bea345b9ed
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Feb 17 09:52:19 2022 +0100

    lauchpad: Manage unhandled exception when listing
    
    Prior to this commit, the listing could fail when reading a page of data in lauchpad.
    This now traps the exception and let the listing continue. If the page is empty, it's
    now no longer accounted for.
    
    This actually allows the listing to finish in case of issues.
    
    Related to T3945

commit 262f9369c837e293f8389dd9f7a6a965c09f621e
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Feb 16 17:56:13 2022 +0100

    launchpad: Allow bzr origins listing
    
    Related to T3945

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/456/ for more details.

swh/lister/launchpad/lister.py
154

Apparently (from my runs in docker), there is also hidden connections happening at that consumption time (it's lazy connection afaict).
So this could raise as well....

status quo: it's the same behavior as of today. So let's keep it that way in that diff for now.

Another evolution can try and work around this.

swh/lister/launchpad/lister.py
154
anlambert added inline comments.
swh/lister/launchpad/lister.py
116

Same comment as in D7195, use the throttling_retry decorator: less code to insert and more readable.

This revision now requires changes to proceed.Feb 17 2022, 11:25 AM
ardumont added inline comments.
swh/lister/launchpad/lister.py
117

That still can happen when all retry tryouts fail. So we return an empty page and move on.

Build is green

Patch application report for D7194 (id=26076)

Could not rebase; Attempt merge onto 31b4429ced...

Updating 31b4429..1ce3ad1
Fast-forward
 swh/lister/launchpad/lister.py                     | 144 ++++++++++++++++-----
 .../tests/data/launchpad_bzr_response.json         | 126 ++++++++++++++++++
 swh/lister/launchpad/tests/test_lister.py          | 135 +++++++++++++++----
 3 files changed, 345 insertions(+), 60 deletions(-)
 create mode 100644 swh/lister/launchpad/tests/data/launchpad_bzr_response.json
Changes applied before test
commit 1ce3ad1c6801b65ea55a9753fb4263cade349c32
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Feb 17 09:52:19 2022 +0100

    lauchpad: Manage unhandled exceptions when listing
    
    Prior to this commit, the listing could fail when either reading a page or the page of
    results (lauchpad api raises RestfulError). This now retries when those kind of
    exceptions happen. If the error persists (after multiple tryouts and exponential
    backoff), the listing continues nonetheless (with warning logs).
    
    Note that if the page ends up being empty, it's no longer accounted for.
    
    This actually allows the listing to finish in case of issues.
    
    Related to T3945

commit 262f9369c837e293f8389dd9f7a6a965c09f621e
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Feb 16 17:56:13 2022 +0100

    launchpad: Allow bzr origins listing
    
    Related to T3945

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/458/ for more details.

swh/lister/launchpad/lister.py
117

I think you should move the try/except block around the call to _page_request as tenacity will reraise the exception after the max number of attempts.

Adapt according to suggestion

swh/lister/launchpad/lister.py
117

one round trip less, right!

Build is green

Patch application report for D7194 (id=26078)

Could not rebase; Attempt merge onto 31b4429ced...

Updating 31b4429..fc2edd2
Fast-forward
 swh/lister/launchpad/lister.py                     | 143 ++++++++++++++++-----
 .../tests/data/launchpad_bzr_response.json         | 126 ++++++++++++++++++
 swh/lister/launchpad/tests/test_lister.py          | 135 +++++++++++++++----
 3 files changed, 344 insertions(+), 60 deletions(-)
 create mode 100644 swh/lister/launchpad/tests/data/launchpad_bzr_response.json
Changes applied before test
commit fc2edd24aa4b71376e2bfba6dbcab1fa68af6f72
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Feb 17 09:52:19 2022 +0100

    launchpad: Manage unhandled exceptions when listing
    
    Prior to this commit, the listing could fail when either reading a page or the page of
    results (lauchpad api raises RestfulError). This now retries when those kind of
    exceptions happen. If the error persists (after multiple tryouts and exponential
    backoff), the listing continues nonetheless (with warning logs).
    
    Note that if the page ends up being empty, it's no longer accounted for.
    
    This actually allows the listing to finish in case of issues.
    
    Related to T3945

commit 262f9369c837e293f8389dd9f7a6a965c09f621e
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Feb 16 17:56:13 2022 +0100

    launchpad: Allow bzr origins listing
    
    Related to T3945

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/460/ for more details.

This revision is now accepted and ready to land.Feb 17 2022, 1:11 PM