Page MenuHomeSoftware Heritage

github: Refactor rate-limiting out of the GitHubLister class
ClosedPublic

Authored by vlorentz on Apr 21 2022, 8:36 PM.

Details

Summary

This will allow the GitHub Metadata Fetcher to reuse the logic
by importing the GitHubSession class.

Depends on D7629.

Diff Detail

Repository
rDLS Listers
Branch
refacto
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 28684
Build 44821: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 44820: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D7630 (id=27633)

Could not rebase; Attempt merge onto 2fa9f0abd2...

Updating 2fa9f0a..f91caed
Fast-forward
 swh/lister/github/lister.py            | 244 +++++++++++++++++----------------
 swh/lister/github/tests/test_lister.py |   6 +-
 2 files changed, 128 insertions(+), 122 deletions(-)
Changes applied before test
commit f91caedbb2ed5eca2572b7b5ebe1413692496f60
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Apr 21 20:34:50 2022 +0200

    github: Refactor rate-limiting out of the GitHubLister class
    
    This will allow the GitHub Metadata Fetcher to reuse the logic
    by importing the GitHubSession class.

commit d0924f39d04e7b7128c55a116280e2c6b6ec5a00
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Apr 21 20:32:45 2022 +0200

    github: Remove dead code
    
    Authentication is handled directly in the session

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/494/ for more details.

I did not check the code. Just a heads up given your goal in the description, you'll
probably need to move that down to swh.core. Currently swh.loader.core does not depend
on swh.lister I think we want to keep it that way (given the current runtime
dependencies of lister).

The metadata fetchers won't be in swh.loader.core, but in a new package called swh.loader.metadata.

Given that they will use the same APIs, I think it is fine to make swh.loader.metadata depend on swh.lister.

The metadata fetchers won't be in swh.loader.core, but in a new package called swh.loader.metadata.

Given that they will use the same APIs, I think it is fine to make swh.loader.metadata depend on swh.lister.

Ah, i understood the metadata loading would happen within the loader's.
I guess i misunderstood and instead that new loader.metadata would pull deps
from both loader.core (BaseLoader or something) and lister (for the metadata utils here).
or something ^ ;)

In practice, swh.loader.metadata will run in the same processes as loaders; but swh.loader.core doesn't depend on it.

In terms of architecture, I think it would make sense to move this to a new swh.lister.github.utils (or swh.lister.utils.github) module, so that third-party users don't import the whole listing machinery but just the relevant bits for platform access.

Isolating this code more explicitly would make it easier to move to a third-party module if it becomes big enough for that to be relevant.

To be able to reuse this code somewhere else, you will want to make it accept a user-agent value explicitly too. That way the loader.metadata can just set its own value.

Build is green

Patch application report for D7630 (id=27698)

Rebasing onto 334c54091e...

Current branch diff-target is up to date.
Changes applied before test
commit 9ee4a99f152d9d6cbe5a6a09895ce9384660be46
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Apr 21 20:34:50 2022 +0200

    github: Refactor rate-limiting out of the GitHubLister class
    
    This will allow the GitHub Metadata Fetcher to reuse the logic
    by importing the GitHubSession class.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/502/ for more details.

This revision is now accepted and ready to land.Apr 26 2022, 11:19 AM