This is a straight port of the old lister.
The lister has only full listing capability.
It scrapes pypi.org list of packages.
Rate-limiting was not encountered but is handled generically.
Related to T2956
Differential D4867
Reimplement PyPI lister using new Lister API tenma on Jan 14 2021, 6:56 PM. Authored by
Details
This is a straight port of the old lister. Related to T2956 tox
Diff Detail
Event TimelineComment Actions Build has FAILED Patch application report for D4867 (id=17237)Could not rebase; Attempt merge onto c782275296... Updating c782275..a51be23 Fast-forward requirements-test.txt | 1 + swh/lister/bitbucket/__init__.py | 9 +- swh/lister/bitbucket/lister.py | 269 +++++-- swh/lister/bitbucket/models.py | 16 - swh/lister/bitbucket/tasks.py | 68 +- swh/lister/bitbucket/tests/conftest.py | 2 +- .../tests/data/bb_api_repositories_page1.json | 124 ++++ .../tests/data/bb_api_repositories_page2.json | 123 ++++ ...ies,after=1970-01-01T00:00:00+00:00,pagelen=100 | 806 --------------------- .../https_api.bitbucket.org/empty_response.json | 4 - .../data/https_api.bitbucket.org/response.json | 1 - swh/lister/bitbucket/tests/test_lister.py | 290 +++++--- swh/lister/bitbucket/tests/test_tasks.py | 81 +-- swh/lister/pypi/__init__.py | 3 +- swh/lister/pypi/lister.py | 106 ++- swh/lister/pypi/tasks.py | 10 +- swh/lister/pypi/tests/test_lister.py | 53 ++ swh/lister/pypi/tests/test_tasks.py | 5 +- 18 files changed, 845 insertions(+), 1126 deletions(-) delete mode 100644 swh/lister/bitbucket/models.py create mode 100644 swh/lister/bitbucket/tests/data/bb_api_repositories_page1.json create mode 100644 swh/lister/bitbucket/tests/data/bb_api_repositories_page2.json delete mode 100644 swh/lister/bitbucket/tests/data/https_api.bitbucket.org/2.0_repositories,after=1970-01-01T00:00:00+00:00,pagelen=100 delete mode 100644 swh/lister/bitbucket/tests/data/https_api.bitbucket.org/empty_response.json delete mode 120000 swh/lister/bitbucket/tests/data/https_api.bitbucket.org/response.json Changes applied before testcommit a51be23a3617d95a20c937e4ebd8d18bf3716861 Author: tenma <tenma+swh@mailbox.org> Date: Thu Jan 14 18:47:26 2021 +0100 [WIP] Reimplement PyPI lister using new Lister API The new lister has only full listing capability. It scrapes pypi.org list of packages. Rate-limiting was not encountered but is handled generically. commit 4dd90ca2f489f406ef924daad33832a38fef96b1 Author: tenma <tenma+swh@mailbox.org> Date: Wed Jan 13 15:44:07 2021 +0100 [WIP] Reimplement Bitbucket lister using new Lister API The new lister has incremental and full listing capability. It can request the Bitbucket API in anonymous and HTTP basic authentication modes. Rate-limiting is not aggressive and is handled. Listing mode, credentials and pagination parameters can be updated after creation. Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/96/ Comment Actions Build has FAILED Patch application report for D4867 (id=17239)Could not rebase; Attempt merge onto c782275296... Updating c782275..480caad Fast-forward requirements-test.txt | 1 + swh/lister/bitbucket/__init__.py | 9 +- swh/lister/bitbucket/lister.py | 269 +++++-- swh/lister/bitbucket/models.py | 16 - swh/lister/bitbucket/tasks.py | 68 +- swh/lister/bitbucket/tests/conftest.py | 2 +- .../tests/data/bb_api_repositories_page1.json | 124 ++++ .../tests/data/bb_api_repositories_page2.json | 123 ++++ ...ies,after=1970-01-01T00:00:00+00:00,pagelen=100 | 806 --------------------- .../https_api.bitbucket.org/empty_response.json | 4 - .../data/https_api.bitbucket.org/response.json | 1 - swh/lister/bitbucket/tests/test_lister.py | 290 +++++--- swh/lister/bitbucket/tests/test_tasks.py | 81 +-- swh/lister/pypi/__init__.py | 3 +- swh/lister/pypi/lister.py | 131 ++-- swh/lister/pypi/tasks.py | 10 +- swh/lister/pypi/tests/test_lister.py | 53 ++ swh/lister/pypi/tests/test_tasks.py | 5 +- 18 files changed, 827 insertions(+), 1169 deletions(-) delete mode 100644 swh/lister/bitbucket/models.py create mode 100644 swh/lister/bitbucket/tests/data/bb_api_repositories_page1.json create mode 100644 swh/lister/bitbucket/tests/data/bb_api_repositories_page2.json delete mode 100644 swh/lister/bitbucket/tests/data/https_api.bitbucket.org/2.0_repositories,after=1970-01-01T00:00:00+00:00,pagelen=100 delete mode 100644 swh/lister/bitbucket/tests/data/https_api.bitbucket.org/empty_response.json delete mode 120000 swh/lister/bitbucket/tests/data/https_api.bitbucket.org/response.json Changes applied before testcommit 480caadc455e282441ff56b81bb931e96fb35149 Author: tenma <tenma+swh@mailbox.org> Date: Thu Jan 14 18:47:26 2021 +0100 [WIP] Reimplement PyPI lister using new Lister API The new lister has only full listing capability. It scrapes pypi.org list of packages. Rate-limiting was not encountered but is handled generically. commit 4dd90ca2f489f406ef924daad33832a38fef96b1 Author: tenma <tenma+swh@mailbox.org> Date: Wed Jan 13 15:44:07 2021 +0100 [WIP] Reimplement Bitbucket lister using new Lister API The new lister has incremental and full listing capability. It can request the Bitbucket API in anonymous and HTTP basic authentication modes. Rate-limiting is not aggressive and is handled. Listing mode, credentials and pagination parameters can be updated after creation. Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/97/ Comment Actions Build is green Patch application report for D4867 (id=17240)Could not rebase; Attempt merge onto c782275296... Updating c782275..083e858 Fast-forward requirements-test.txt | 1 + swh/lister/bitbucket/__init__.py | 9 +- swh/lister/bitbucket/lister.py | 269 +++++-- swh/lister/bitbucket/models.py | 16 - swh/lister/bitbucket/tasks.py | 68 +- swh/lister/bitbucket/tests/conftest.py | 2 +- .../tests/data/bb_api_repositories_page1.json | 124 ++++ .../tests/data/bb_api_repositories_page2.json | 123 ++++ ...ies,after=1970-01-01T00:00:00+00:00,pagelen=100 | 806 --------------------- .../https_api.bitbucket.org/empty_response.json | 4 - .../data/https_api.bitbucket.org/response.json | 1 - swh/lister/bitbucket/tests/test_lister.py | 290 +++++--- swh/lister/bitbucket/tests/test_tasks.py | 81 +-- swh/lister/pypi/__init__.py | 3 +- swh/lister/pypi/lister.py | 131 ++-- swh/lister/pypi/tasks.py | 10 +- swh/lister/pypi/tests/test_lister.py | 53 ++ swh/lister/pypi/tests/test_tasks.py | 9 +- 18 files changed, 829 insertions(+), 1171 deletions(-) delete mode 100644 swh/lister/bitbucket/models.py create mode 100644 swh/lister/bitbucket/tests/data/bb_api_repositories_page1.json create mode 100644 swh/lister/bitbucket/tests/data/bb_api_repositories_page2.json delete mode 100644 swh/lister/bitbucket/tests/data/https_api.bitbucket.org/2.0_repositories,after=1970-01-01T00:00:00+00:00,pagelen=100 delete mode 100644 swh/lister/bitbucket/tests/data/https_api.bitbucket.org/empty_response.json delete mode 120000 swh/lister/bitbucket/tests/data/https_api.bitbucket.org/response.json Changes applied before testcommit 083e8585a50e3d2ece89e6f001befa0561d6312f Author: tenma <tenma+swh@mailbox.org> Date: Thu Jan 14 18:47:26 2021 +0100 [WIP] Reimplement PyPI lister using new Lister API The new lister has only full listing capability. It scrapes pypi.org list of packages. Rate-limiting was not encountered but is handled generically. commit 4dd90ca2f489f406ef924daad33832a38fef96b1 Author: tenma <tenma+swh@mailbox.org> Date: Wed Jan 13 15:44:07 2021 +0100 [WIP] Reimplement Bitbucket lister using new Lister API The new lister has incremental and full listing capability. It can request the Bitbucket API in anonymous and HTTP basic authentication modes. Rate-limiting is not aggressive and is handled. Listing mode, credentials and pagination parameters can be updated after creation. See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/98/ for more details. Comment Actions Build is green Patch application report for D4867 (id=17241)Rebasing onto c782275296... Current branch diff-target is up to date. Changes applied before testcommit 9de12fb5d2fef03177be38180a8cdd851a79a6a5 Author: tenma <tenma+swh@mailbox.org> Date: Thu Jan 14 18:47:26 2021 +0100 [WIP] Reimplement PyPI lister using new Lister API The new lister has only full listing capability. It scrapes pypi.org list of packages. Rate-limiting was not encountered but is handled generically. See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/99/ for more details. Comment Actions Build is green Patch application report for D4867 (id=17245)Rebasing onto c782275296... Current branch diff-target is up to date. Changes applied before testcommit 46a216ee820f87e88c7104b8c8e991f02813e6f8 Author: tenma <tenma+swh@mailbox.org> Date: Thu Jan 14 18:47:26 2021 +0100 [WIP] Reimplement PyPI lister using new Lister API The new lister has only full listing capability. It scrapes pypi.org list of packages. Rate-limiting was not encountered but is handled generically. See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/100/ for more details. Comment Actions Thanks! I have made a few comments inline. All in all, as this lister makes a single request, I think we can drop the whole ratelimit handling logic which will make it much, much simpler. You'll also need to remove the models.py file which should not be used any longer, as well as clean up the commented / stringified old code.
Comment Actions In short, let's KISS!
Comment Actions Build is green Patch application report for D4867 (id=17290)Rebasing onto c782275296... Current branch diff-target is up to date. Changes applied before testcommit c087c93e8e2a81d93b846c50dff3d293e54ac713 Author: tenma <tenma+swh@mailbox.org> Date: Thu Jan 14 18:47:26 2021 +0100 [WIP] Reimplement PyPI lister using new Lister API The new lister has only full listing capability. It scrapes pypi.org list of packages. Rate-limiting was not encountered but is handled generically. See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/108/ for more details. Comment Actions Looks fine to me, thanks. Please apply @anlambert's remaining comments before landing! :)
Comment Actions Build is green Patch application report for D4867 (id=17331)Rebasing onto 9fd91f007d... Current branch diff-target is up to date. Changes applied before testcommit a0ef8dfd795e96958a07847a6bd101f8dac1d9f8 Author: tenma <tenma+swh@mailbox.org> Date: Thu Jan 14 18:47:26 2021 +0100 [WIP] Reimplement PyPI lister using new Lister API The new lister has only full listing capability. It scrapes pypi.org list of packages. Rate-limiting was not encountered but is handled generically. See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/122/ for more details. Comment Actions Build is green Patch application report for D4867 (id=17400)Rebasing onto 565e7423e3... Current branch diff-target is up to date. Changes applied before testcommit 62c825b8cb80aa66b4a04af88635a662c11b6fb2 Author: tenma <tenma+swh@mailbox.org> Date: Wed Jan 20 15:43:07 2021 +0100 Reimplement PyPI lister using new Lister API The new lister has only full listing capability. It scrapes pypi.org list of packages. Rate-limiting was not encountered but is handled generically. See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/123/ for more details. |