Page MenuHomeSoftware Heritage

Reimplement Bitbucket lister using new Lister API
ClosedPublic

Authored by tenma on Jan 12 2021, 2:27 PM.

Details

Summary

The new lister has incremental and full listing capability.
It can request the Bitbucket API in anonymous and HTTP basic authentication modes.
Rate-limiting is not aggressive and is handled.

Related to T2955

Test Plan

Covered: incremental and full modes, tasks, authentication
Not covered: some exceptional cases (see below)

tox

Diff Detail

Repository
rDLS Listers
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 18433
Build 28497: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 28496: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Build has FAILED

Patch application report for D4843 (id=17172)

Rebasing onto c782275296...

First, rewinding head to replay your work on top of it...
Applying: WIP new Bitbucket lister
Changes applied before test
commit 4afe9f7f6661fffec6e10fb7515dcf0405fc49f5
Author: tenma <tenma+swh@mailbox.org>
Date:   Tue Jan 12 18:08:53 2021 +0100

    WIP new Bitbucket lister

Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/81/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/81/console

Harbormaster returned this revision to the author for changes because remote builds failed.Jan 13 2021, 11:08 AM
Harbormaster failed remote builds in B18316: Diff 17172!

Fix failing test after changes in query parameters to Bitbucket API requests.

Build is green

Patch application report for D4843 (id=17175)

Rebasing onto c782275296...

First, rewinding head to replay your work on top of it...
Applying: WIP new Bitbucket lister
Changes applied before test
commit 5748bf904b35e799d2328725d1b4ef8b37ba0969
Author: tenma <tenma+swh@mailbox.org>
Date:   Tue Jan 12 18:08:53 2021 +0100

    WIP new Bitbucket lister

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/82/ for more details.

refactor auth and backoff

Build is green

Patch application report for D4843 (id=17177)

Rebasing onto c782275296...

First, rewinding head to replay your work on top of it...
Applying: WIP new Bitbucket lister
Changes applied before test
commit 5dfbe59051996bfe4338ec991fdc4a5ae59098ee
Author: tenma <tenma+swh@mailbox.org>
Date:   Wed Jan 13 11:54:18 2021 +0100

    WIP new Bitbucket lister
    
    Reviewers: #reviewers
    
    Subscribers: anlambert
    
    Tags: #bitbucket_lister, #lister
    
    Differential Revision: https://forge.softwareheritage.org/D4843

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/83/ for more details.

Update:

  • add and improve tests
  • rename some variables and methods of the lister
  • allow to override configuration through scheduler tasks parameters
  • simplify JSON files for tests

Build has FAILED

Patch application report for D4843 (id=17181)

Rebasing onto c782275296...

First, rewinding head to replay your work on top of it...
Applying: WIP new Bitbucket lister
Changes applied before test
commit 8a17cea06a1eac10e3720dc3995bcf130222d7b5
Author: tenma <tenma+swh@mailbox.org>
Date:   Wed Jan 13 12:20:17 2021 +0100

    WIP new Bitbucket lister

Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/84/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/84/console

lgtm (minus the build failure that is ;)

(i did not read the test yet though)

one question about some more modules to remove inline.

swh/lister/bitbucket/lister.py
14

Do we entirely drop the old bitbucket lister code?

If yes, you can also remove that module.

Build is green

Patch application report for D4843 (id=17193)

Rebasing onto c782275296...

Current branch diff-target is up to date.
Changes applied before test
commit 489c9e25ab40b9ecc6b92e4cf9346ab7eef14978
Author: tenma <tenma+swh@mailbox.org>
Date:   Wed Jan 13 15:44:07 2021 +0100

    WIP new Bitbucket lister
    
    Reviewers: #reviewers
    
    Subscribers: anlambert
    
    Tags: #bitbucket_lister, #lister
    
    Differential Revision: https://forge.softwareheritage.org/D4843

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/85/ for more details.

Update: Remove old range lister task and update tasks tests.

anlambert retitled this revision from WIP new Bitbucket lister to [WIP] Reimplement Bitbucket lister using new Lister API.Jan 13 2021, 7:06 PM

Build is green

Patch application report for D4843 (id=17197)

Rebasing onto c782275296...

Current branch diff-target is up to date.
Changes applied before test
commit dffe53c2260188fb3ab8995113ac120359c7b9c3
Author: tenma <tenma+swh@mailbox.org>
Date:   Wed Jan 13 18:44:08 2021 +0100

    bitbucket: Reimplement lister using new API (WIP)

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/86/ for more details.

Fix copyrights and rebase

Build is green

Patch application report for D4843 (id=17198)

Rebasing onto c782275296...

Current branch diff-target is up to date.
Changes applied before test
commit 4dd90ca2f489f406ef924daad33832a38fef96b1
Author: tenma <tenma+swh@mailbox.org>
Date:   Wed Jan 13 15:44:07 2021 +0100

    [WIP] Reimplement Bitbucket lister using new Lister API
    
    The new lister has incremental and full listing capability.
    It can request the Bitbucket API in anonymous and HTTP basic authentication
    modes. Rate-limiting is not aggressive and is handled.
    Listing mode, credentials and pagination parameters can be updated after
    creation.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/87/ for more details.

tenma retitled this revision from [WIP] Reimplement Bitbucket lister using new Lister API to Reimplement Bitbucket lister using new Lister API.Jan 14 2021, 10:26 AM
tenma edited the test plan for this revision. (Show Details)
tenma edited the summary of this revision. (Show Details)
tenma edited the test plan for this revision. (Show Details)

Not covered: authentication, max retries, small corner cases (see below)

What do you mean?


What is self.backoff? You're using it both as a time (time.sleep(self.backoff)) and as a dimensionless unit (self.backoff = self.BACKOFF_FACTOR)


And self.request_count should be a local variable of get_pages named retries_count, because it's actually used for. Plus, the way it's currently written, it looks like that whenever it reaches MAX_RETRIES, it will never be reset without creating a new BitbucketLister object.

swh/lister/bitbucket/lister.py
89

maybe a warning if there is more than one set of credentials?

92–103

Shouldn't these be (class)methods of BitbucketListerState?

105
110
115

And why does it need to be Optional[bool] instead of just bool?

176

IMO this yield should be moved before if next_page_url is not None for two reasons:

  1. we probably want to yield body["values"] even if we fail to parse next_page_url
  2. this means we can make last_repo_cdate non-optional and replace last_repo_cdate = None with a break (and make the loop a while True)
203
swh/lister/bitbucket/tests/test_lister.py
164

isn't this a method?

This revision now requires changes to proceed.Jan 14 2021, 12:34 PM

Sorry, I clicked "submit" too early.

I meant to say I don't have a good idea of how well it fits architecturaly, but I have some small comments nonetheless

What is self.backoff? You're using it both as a time (time.sleep(self.backoff)) and as a dimensionless unit (self.backoff = self.BACKOFF_FACTOR)

Nevermind that, I guess self.backoff = self.BACKOFF_FACTOR is implicitly meant as multiplying the backoff by 1 second. The initial backoff could be configurable though, but that's not very important.

Thanks for the remarks, will go through them.

swh/lister/bitbucket/lister.py
92–103

the new lister API has them as instance methods of the lister, but indeed the API could be amended to have it as ListerState class methods.
I would say, let discuss it and do that later.

swh/lister/bitbucket/tests/test_lister.py
164

oops, will fix this.

swh/lister/bitbucket/lister.py
92–103

yeah that makes sense, let's leave it like this then

  • fix assert
  • fix typing
  • add a warning on credentials

Build is green

Patch application report for D4843 (id=17294)

Rebasing onto c782275296...

Current branch diff-target is up to date.
Changes applied before test
commit 609ed8c893542a0f3206822bcf1fd8185cabf2a2
Author: tenma <tenma+swh@mailbox.org>
Date:   Wed Jan 13 15:44:07 2021 +0100

    [WIP] Reimplement Bitbucket lister using new Lister API
    
    The new lister has incremental and full listing capability.
    It can request the Bitbucket API in anonymous and HTTP basic authentication
    modes. Rate-limiting is not aggressive and is handled.
    Listing mode, credentials and pagination parameters can be updated after
    creation.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/109/ for more details.

  • apply @vlorentz suggestions
  • use tenacity retrying decorator through lister.utils thanks to D4869
  • remove setters no more useful after D4872
  • adapt tests accordingly

Build is green

Patch application report for D4843 (id=17317)

Rebasing onto 9fd91f007d...

Current branch diff-target is up to date.
Changes applied before test
commit 2cfc017435e8dee06f24717a947e8fe1e6f2d940
Author: tenma <tenma+swh@mailbox.org>
Date:   Wed Jan 13 15:44:07 2021 +0100

    [WIP] Reimplement Bitbucket lister using new Lister API
    
    The new lister has incremental and full listing capability.
    It can request the Bitbucket API in anonymous and HTTP basic authentication
    modes. Rate-limiting is not aggressive and is handled.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/117/ for more details.

Have test coverage for credentials setting

Build is green

Patch application report for D4843 (id=17318)

Rebasing onto 9fd91f007d...

Current branch diff-target is up to date.
Changes applied before test
commit b0b247e7a25e761c88ec0419cecc95a4198ffb7c
Author: tenma <tenma+swh@mailbox.org>
Date:   Wed Jan 13 15:44:07 2021 +0100

    [WIP] Reimplement Bitbucket lister using new Lister API
    
    The new lister has incremental and full listing capability.
    It can request the Bitbucket API in anonymous and HTTP basic authentication
    modes. Rate-limiting is not aggressive and is handled.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/118/ for more details.

tenma edited the summary of this revision. (Show Details)

Remove the persistent isort conftest modification from diff

swh/lister/bitbucket/lister.py
107

Can you rename the function to set_credentials, shorter is better. Also parameters should be typed to Optional[str].

113

you can remove that parameter as it is already the default value of the decorator

Build is green

Patch application report for D4843 (id=17325)

Rebasing onto 9fd91f007d...

Current branch diff-target is up to date.
Changes applied before test
commit dec3559f0ae277ad9271468444b6cad494bff6b9
Author: tenma <tenma+swh@mailbox.org>
Date:   Wed Jan 13 15:44:07 2021 +0100

    [WIP] Reimplement Bitbucket lister using new Lister API
    
    The new lister has incremental and full listing capability.
    It can request the Bitbucket API in anonymous and HTTP basic authentication
    modes. Rate-limiting is not aggressive and is handled.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/119/ for more details.

Remove tenacity wait parameter

Build is green

Patch application report for D4843 (id=17326)

Rebasing onto 9fd91f007d...

Current branch diff-target is up to date.
Changes applied before test
commit c8d7259ecbf9590e67160345d3cbf561f9206195
Author: tenma <tenma+swh@mailbox.org>
Date:   Wed Jan 13 15:44:07 2021 +0100

    [WIP] Reimplement Bitbucket lister using new Lister API
    
    The new lister has incremental and full listing capability.
    It can request the Bitbucket API in anonymous and HTTP basic authentication
    modes. Rate-limiting is not aggressive and is handled.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/120/ for more details.

Revert changes to the credentials setting method, and improve corresponding test

Build is green

Patch application report for D4843 (id=17329)

Rebasing onto 9fd91f007d...

Current branch diff-target is up to date.
Changes applied before test
commit 1b23378781dbfdccd4243556ade9f8d306a0d29c
Author: tenma <tenma+swh@mailbox.org>
Date:   Wed Jan 13 15:44:07 2021 +0100

    [WIP] Reimplement Bitbucket lister using new Lister API
    
    The new lister has incremental and full listing capability.
    It can request the Bitbucket API in anonymous and HTTP basic authentication
    modes. Rate-limiting is not aggressive and is handled.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/121/ for more details.

Looks good to me, time to land this !

This revision is now accepted and ready to land.Jan 20 2021, 3:26 PM
This revision was automatically updated to reflect the committed changes.
tenma marked 2 inline comments as done.