Page MenuHomeSoftware Heritage

Reimplement Gitea lister using new Lister API
ClosedPublic

Authored by tenma on Jan 21 2021, 4:26 PM.

Details

Reviewers
anlambert
Group Reviewers
Reviewers
Maniphest Tasks
T2971: Port Gitea lister to the new Lister API
Summary

The lister is stateless and has only full listing capability.
It can request the Gitea API using HTTP token authentication.
Rate-limiting was not encountered but is handled generically.
Added support for getting repo last update date through API.

Also add Gitea lister mandatory params to lister/tests/test_cli.

Depends on D4906 for build
Closes T2971

Test Plan

Tests have good coverage but must be splitted.

Example test run:

pytest --log-cli-level=WARNING swh/lister/gitea/tests/
==================================================================================== test session starts =====================================================================================
platform linux -- Python 3.7.3, pytest-6.2.1, py-1.10.0, pluggy-0.13.1
plugins: hypothesis-6.0.1, flask-1.1.0, requests-mock-1.8.0, django-4.1.0, postgresql-2.5.2, forked-1.3.0, mock-3.5.1, asyncio-0.14.0, xdist-2.2.0, dash-1.18.1, cov-2.11.0, swh.journal-0.6.1, swh.core-0.11.1.dev4+g40018f2
collected 8 items                                                                                                                                                                            

swh/lister/gitea/tests/test_lister.py::test_gitea_full_listing 
--------------------------------------------------------------------------------------- live log call ----------------------------------------------------------------------------------------
WARNING  swh.lister.gitea.lister:lister.py:79 No authentication token set in configuration, using anonymous mode
WARNING  swh.lister.gitea.lister:lister.py:97 Unexpected HTTP status code 429 on https://try.gitea.io/api/v1/repos/search?sort=id&order=asc&limit=3&page=2: b''
WARNING  swh.lister.gitea.lister:before_sleep.py:45 Retrying swh.lister.gitea.lister.GiteaLister.page_request in 1.0 seconds as it raised HTTPError: 429 Client Error: None for url: https://try.gitea.io/api/v1/repos/search?sort=id&order=asc&limit=3&page=2.
PASSED                                                                                                                                                                                 [ 12%]
swh/lister/gitea/tests/test_lister.py::test_gitea_auth_instance 
--------------------------------------------------------------------------------------- live log call ----------------------------------------------------------------------------------------
WARNING  swh.lister.gitea.lister:lister.py:75 Using authentication token from user u
PASSED                                                                                                                                                                                 [ 25%]
swh/lister/gitea/tests/test_lister.py::test_gitea_list_http_error[400] 
--------------------------------------------------------------------------------------- live log call ----------------------------------------------------------------------------------------
WARNING  swh.lister.gitea.lister:lister.py:79 No authentication token set in configuration, using anonymous mode
WARNING  swh.lister.gitea.lister:lister.py:97 Unexpected HTTP status code 400 on https://try.gitea.io/api/v1/repos/search?sort=id&order=asc&limit=3&page=1: b''
PASSED                                                                                                                                                                                 [ 37%]
swh/lister/gitea/tests/test_lister.py::test_gitea_list_http_error[500] 
--------------------------------------------------------------------------------------- live log call ----------------------------------------------------------------------------------------
WARNING  swh.lister.gitea.lister:lister.py:79 No authentication token set in configuration, using anonymous mode
WARNING  swh.lister.gitea.lister:lister.py:97 Unexpected HTTP status code 500 on https://try.gitea.io/api/v1/repos/search?sort=id&order=asc&limit=3&page=1: b''
PASSED                                                                                                                                                                                 [ 50%]
swh/lister/gitea/tests/test_lister.py::test_gitea_list_http_error[502] 
--------------------------------------------------------------------------------------- live log call ----------------------------------------------------------------------------------------
WARNING  swh.lister.gitea.lister:lister.py:79 No authentication token set in configuration, using anonymous mode
WARNING  swh.lister.gitea.lister:lister.py:97 Unexpected HTTP status code 502 on https://try.gitea.io/api/v1/repos/search?sort=id&order=asc&limit=3&page=1: b''
PASSED                                                                                                                                                                                 [ 62%]
swh/lister/gitea/tests/test_tasks.py::test_ping PASSED                                                                                                                                 [ 75%]
swh/lister/gitea/tests/test_tasks.py::test_full_listing PASSED                                                                                                                         [ 87%]
swh/lister/gitea/tests/test_tasks.py::test_full_listing_params PASSED                                                                                                                  [100%]

===================================================================================== 8 passed in 10.73s =====================================================================================

Diff Detail

Repository
rDLS Listers
Branch
gitea
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 18616
Build 28794: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 28793: arc lint + arc unit

Event Timeline

Build has FAILED

Patch application report for D4907 (id=17482)

Rebasing onto 62c825b8cb...

Current branch diff-target is up to date.
Changes applied before test
commit 363ff12ee07836f8ffbb25caa49794935f9c7356
Author: tenma <tenma+swh@mailbox.org>
Date:   Thu Jan 21 16:09:42 2021 +0100

    gitea: cleanup (SQUASH IT)

commit f034d3e501396d50749bc1d0c0669bdf938aaf8e
Author: tenma <tenma+swh@mailbox.org>
Date:   Thu Jan 21 15:27:42 2021 +0100

    Reimplement Gitea lister using new Lister API
    
    The lister is stateless and has full listing capability.
    It can request the Gitea API using HTTP token authentication.
    Rate-limiting was not encountered but is handled generically.
    Added support for getting repo last update date through API.

commit 2959d0b9a68c05d979c6910dee500b2ec5e61318
Author: tenma <tenma+swh@mailbox.org>
Date:   Thu Jan 21 15:17:07 2021 +0100

    tests.cli: add Gitea lister mandatory params

Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/127/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/127/console

Harbormaster returned this revision to the author for changes because remote builds failed.Jan 21 2021, 4:26 PM
Harbormaster failed remote builds in B18616: Diff 17482!
tenma retitled this revision from Reimplement Gitea lister using new Lister API to WIP Reimplement Gitea lister using new Lister API.Jan 21 2021, 4:32 PM
tenma edited the test plan for this revision. (Show Details)
tenma added projects: Lister, Sprint 2021 01.

Fixed tests, removed models, added comments

Build has FAILED

Patch application report for D4907 (id=17483)

Rebasing onto 62c825b8cb...

Current branch diff-target is up to date.
Changes applied before test
commit ccc50e90b47506c9b0c71d3459b190ba058f415d
Author: tenma <tenma+swh@mailbox.org>
Date:   Thu Jan 21 16:54:38 2021 +0100

    Cleanup Gitea lister (SQUASH IT)

commit 2153548d024fa2e4aa358dd9f50b45b7aa87fd8b
Author: tenma <tenma+swh@mailbox.org>
Date:   Thu Jan 21 15:27:42 2021 +0100

    Reimplement Gitea lister using new Lister API
    
    The lister is stateless and has full listing capability.
    It can request the Gitea API using HTTP token authentication.
    Rate-limiting was not encountered but is handled generically.
    Added support for getting repo last update date through API.

commit 2959d0b9a68c05d979c6910dee500b2ec5e61318
Author: tenma <tenma+swh@mailbox.org>
Date:   Thu Jan 21 15:17:07 2021 +0100

    tests.cli: add Gitea lister mandatory params

Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/128/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/128/console

Harbormaster returned this revision to the author for changes because remote builds failed.Jan 21 2021, 4:57 PM
Harbormaster failed remote builds in B18617: Diff 17483!
swh/lister/gitea/lister.py
30

A link to the Gitea API documentation would be more useful here https://try.gitea.io/api/swagger.

40

I would rather make the url and instance parameters mandatory here and set default value of "https://try.gitea.io/api/v1/" and "gitea".

42

maybe add a comment saying that it is actually the maximum number of pages Gitea API can return ?

I only tested with try.gitea and codeberg so this could not be true to other instances but I truly doubt it.

65–70

That warning is not really useful. I guess you could pick a random token in the list instead.

71–76

You should remove that method and merge its content in the constructor.

swh/lister/gitea/tests/test_lister.py
17–48

There is a lot of duplicated hardcoded URLs in those fixture.

You should use a function based on string templating or formating to improve readability.

Also you can simplify the name of JSON data files, something like data/gitea_repos_pageX.json.
Previous naming was only required when using the requests_mock_datadir fixture.

126

you can remove the trygitea_p1 fixture use and use baseurl + lister.REPO_LIST_PATH as requests_mock.get URL.

tenma added inline comments.
swh/lister/gitea/lister.py
40

What do you mean by mandatory and have a default value?

The idea was that for such services that have many instances, we must pass the URL, but the instance name can be inferred if not passed.

Build has FAILED

Patch application report for D4907 (id=17503)

Rebasing onto 5411141e3a...

Current branch diff-target is up to date.
Changes applied before test
commit 1f4aee63fb55b21244b6fa68458dabf3c487d04e
Author: tenma <tenma+swh@mailbox.org>
Date:   Thu Jan 21 19:28:04 2021 +0100

    Address remarks from @anlambert (SQUASH IT)

commit 30b8e5330634d3ec52ebd91311689e459daacf77
Author: tenma <tenma+swh@mailbox.org>
Date:   Thu Jan 21 16:54:38 2021 +0100

    Cleanup Gitea lister (SQUASH IT)

commit ae6ae630d2af1207a54f58eeeb4d066712c9d84b
Author: tenma <tenma+swh@mailbox.org>
Date:   Thu Jan 21 15:27:42 2021 +0100

    Reimplement Gitea lister using new Lister API
    
    The lister is stateless and has full listing capability.
    It can request the Gitea API using HTTP token authentication.
    Rate-limiting was not encountered but is handled generically.
    Added support for getting repo last update date through API.

commit 40d82690a81ef906b56f9674f0b2fcc1d1b03325
Author: tenma <tenma+swh@mailbox.org>
Date:   Thu Jan 21 15:17:07 2021 +0100

    tests.cli: add Gitea lister mandatory params

Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/129/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/129/console

Harbormaster returned this revision to the author for changes because remote builds failed.Jan 21 2021, 7:33 PM
Harbormaster failed remote builds in B18635: Diff 17503!

Build has FAILED

Patch application report for D4907 (id=17503)

Rebasing onto 5411141e3a...

Current branch diff-target is up to date.
Changes applied before test
commit 1f4aee63fb55b21244b6fa68458dabf3c487d04e
Author: tenma <tenma+swh@mailbox.org>
Date:   Thu Jan 21 19:28:04 2021 +0100

    Address remarks from @anlambert (SQUASH IT)

commit 30b8e5330634d3ec52ebd91311689e459daacf77
Author: tenma <tenma+swh@mailbox.org>
Date:   Thu Jan 21 16:54:38 2021 +0100

    Cleanup Gitea lister (SQUASH IT)

commit ae6ae630d2af1207a54f58eeeb4d066712c9d84b
Author: tenma <tenma+swh@mailbox.org>
Date:   Thu Jan 21 15:27:42 2021 +0100

    Reimplement Gitea lister using new Lister API
    
    The lister is stateless and has full listing capability.
    It can request the Gitea API using HTTP token authentication.
    Rate-limiting was not encountered but is handled generically.
    Added support for getting repo last update date through API.

commit 40d82690a81ef906b56f9674f0b2fcc1d1b03325
Author: tenma <tenma+swh@mailbox.org>
Date:   Thu Jan 21 15:17:07 2021 +0100

    tests.cli: add Gitea lister mandatory params

Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/130/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/130/console

swh/lister/gitea/lister.py
65–70

Updated. The warning is more about only supporting tokens than the fact that only one is supported. (Not the same case as the other lister.)
I could use a random one but it is simpler and more explicit like this.
If I chose randomly, it would be useful to display which one, but it is a bad idea to display the creds.

Build is green

Patch application report for D4907 (id=17549)

Rebasing onto ff232f0d91...

First, rewinding head to replay your work on top of it...
Applying: tests.cli: add Gitea lister mandatory params
Applying: Reimplement Gitea lister using new Lister API
Applying: gitea.tests: split and make them more thorough
Changes applied before test
commit 8ba837839507380e4cb53db1c3636050971926f5
Author: tenma <tenma+swh@mailbox.org>
Date:   Thu Jan 21 21:08:31 2021 +0100

    gitea.tests: split and make them more thorough

commit c88c28494d84e9c159bfd5868c12eb4bf2669874
Author: tenma <tenma+swh@mailbox.org>
Date:   Thu Jan 21 15:27:42 2021 +0100

    Reimplement Gitea lister using new Lister API
    
    The lister is stateless and has full listing capability.
    It can request the Gitea API using HTTP token authentication.
    Rate-limiting was not encountered but is handled generically.
    Added support for getting repo last update date through API.

commit 35c1a85335ed64e8a861c8b78a8875e00ee56483
Author: tenma <tenma+swh@mailbox.org>
Date:   Thu Jan 21 15:17:07 2021 +0100

    tests.cli: add Gitea lister mandatory params

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/145/ for more details.

anlambert added inline comments.
swh/lister/gitea/lister.py
40

Right, I did not see the instance = parse_url(url).host instruction below, forget that comment

65–70

Updated. The warning is more about only supporting tokens than the fact that only one is supported. (Not the same case as the other lister.)

That's implementation details and not really relevant to log, please remove it.
I would rather log the fact that the lister works in anonymous mode when no credentials is set
(as in the github lister)

I could use a random one but it is simpler and more explicit like this.

I do not agree, we could have multiple credentials stored in configuration so a random
picking seems a better choice here and random.choice(self.credentials)["password"] is
also explicit in what it is doing.

If I chose randomly, it would be useful to display which one, but it is a bad idea to display the creds.

Yes do not do that. You could log the username associated to the token if available in the conf instead (as in the github lister).

So the code could be rewritten in:

if len(self.credentials) > 0:
    cred = random.choice(self.credentials)
    username = cred.get("username")
    logger.warning("Using authentication token from user %s", username or "???")
    self.set_credentials(username, cred["password"])
else:
    logger.warning("No authentication tokens set in configuration, using anonymous mode")

I will make the same changes to the Bitbucket lister.

This revision now requires changes to proceed.Jan 25 2021, 2:00 PM
swh/lister/gitea/lister.py
65–70

I will make the same changes to the Bitbucket lister.

D4938

tenma retitled this revision from WIP Reimplement Gitea lister using new Lister API to Reimplement Gitea lister using new Lister API.Jan 25 2021, 3:03 PM
tenma added inline comments.
swh/lister/gitea/lister.py
65–70

OK it is good like this.

improve handling of credentials

Build is green

Patch application report for D4907 (id=17581)

Rebasing onto e4a590fc7f...

First, rewinding head to replay your work on top of it...
Applying: tests.cli: add Gitea lister mandatory params
Applying: Reimplement Gitea lister using new Lister API
Applying: gitea.tests: split and make them more thorough
Applying: gitea.lister: improve handling of credentials
Changes applied before test
commit 3c03fb9f5385a9bb39badc05d842fda19a7f22be
Author: tenma <tenma+swh@mailbox.org>
Date:   Mon Jan 25 15:14:56 2021 +0100

    gitea.lister: improve handling of credentials

commit 3608a226de490e2aef647b22ff56e2071ccee28f
Author: tenma <tenma+swh@mailbox.org>
Date:   Thu Jan 21 21:08:31 2021 +0100

    gitea.tests: split and make them more thorough

commit 6f75c162c5f048c3b0f3a1ca907f6835828a518f
Author: tenma <tenma+swh@mailbox.org>
Date:   Thu Jan 21 15:27:42 2021 +0100

    Reimplement Gitea lister using new Lister API
    
    The lister is stateless and has full listing capability.
    It can request the Gitea API using HTTP token authentication.
    Rate-limiting was not encountered but is handled generically.
    Added support for getting repo last update date through API.

commit 299a23e1998764755232d4f22a1dba3ae84fc041
Author: tenma <tenma+swh@mailbox.org>
Date:   Thu Jan 21 15:17:07 2021 +0100

    tests.cli: add Gitea lister mandatory params

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/157/ for more details.

This revision is now accepted and ready to land.Jan 25 2021, 3:41 PM