Page MenuHomeSoftware Heritage

pypi.lister: Handle xml-rpc throttling properly

Authored by ardumont on Jul 9 2021, 10:50 AM.



Actual run in docker made this apparent [1]

[1] almost immediately

xmlrpc.client.Fault: <Fault -32500: 'HTTPTooManyRequests: The action could not be performed because there were too many requests by the client. Limit may reset in 1 seconds.'>

Depends on D5977
Related to T3399

Test Plan


Diff Detail

rDLS Listers
No Linters Available
No Unit Test Coverage
Build Status
Buildable 22543
Build 35132: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 35131: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D5983 (id=21568)

Could not rebase; Attempt merge onto 698be475e9...

Updating 698be47..92e7029
 swh/lister/pypi/                        | 169 ++++++++++++---
 swh/lister/pypi/                         |   4 +-
 swh/lister/pypi/tests/data/ |  12 --
 swh/lister/pypi/tests/             | 259 +++++++++++++++++------
 4 files changed, 342 insertions(+), 102 deletions(-)
 delete mode 100644 swh/lister/pypi/tests/data/
Changes applied before test
commit 92e7029f8ed5ab658dac3f4e68d18f8c42075c3f
Author: Antoine R. Dumont (@ardumont) <>
Date:   Fri Jul 9 10:42:57 2021 +0200

    pypi.lister: Handle xml-rpc throttling properly
    Related to T3399

commit 77f7da32e06361e8a4c860ad8c884582e5804796
Author: Antoine R. Dumont (@ardumont) <>
Date:   Wed Jul 7 16:14:09 2021 +0200

    Make PyPI lister incremental and complete in regards to last_update
    This rewrote the current implementation to actually use pypi's xml-rpc api which allows
    to be incremental. It also allows to fetch the last release date per package. This last
    part actually make it possible to update the "last_update" entry in the ListedOrigin
    Related to T3399

See for more details.