Page MenuHomeSoftware Heritage

github: Improve credential expiration policy
AbandonedPublic

Authored by ardumont on Nov 25 2019, 11:45 AM.

Details

Reviewers
olasd
douardda
Group Reviewers
Reviewers
Summary

When a credential is reported as "bad", consider it expired. Remove it from the
current worker credentials config. Then, try immediately with another
credential.

Related T2099

Test Plan

tox

Diff Detail

Repository
rDLS Listers
Branch
improve-credential-expiration-policy
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 9468
Build 13910: tox-on-jenkinsJenkins
Build 13909: arc lint + arc unit

Event Timeline

ardumont created this revision.Nov 25 2019, 11:45 AM
ardumont updated this revision to Diff 8080.Nov 25 2019, 11:50 AM

Fix docstring

ardumont added inline comments.Nov 25 2019, 11:51 AM
swh/lister/core/lister_transports.py
97

type: ignore because otherwise it complains about the config not being declared for that class...
which is also true...
but...
let's refactor this another day.

(same goes for the other type: ignore below)

swh/lister/github/lister.py
56

Another implem' could be to override the default request_params method and set the current auth tuple (self.auth = auth or something) used...

I'm not sure what's cleaner, clearer, etc...

ardumont updated this revision to Diff 8081.Nov 25 2019, 12:00 PM

Fix authorization decoding computation function

ardumont updated this revision to Diff 8082.Nov 25 2019, 12:16 PM

Add test around credential_used function

ardumont updated this revision to Diff 8390.Dec 2 2019, 2:26 PM

Update according to discussion

There exists other authentication schemes.

So now the implementation relies on knowing the authentication auth tuple. The
github lister now removes it when it's deemed expired. And then immediately
queries again the listing with another credential.

ardumont updated this revision to Diff 8394.Dec 2 2019, 3:54 PM

Plug to another branch

douardda requested changes to this revision.Jan 9 2020, 10:54 AM
douardda added a subscriber: douardda.

Am I reading this wrong, or is the new behavior implemented in this diff not really tested?

swh/lister/core/lister_transports.py
145

I don't really understand the semantic of 'reset' in the method (at first glance).

Why this 'reset' does remove a token? Remove from what? Why/when is this useful or needed?

I mean credentials are loaded by the 'request_instance_credentials' method. Now this method (I guess called when a credential usage generated errors ; not yet read the remaining of the code), will gradually remove entries from the list of known credentials. So each time an error occurs this list will be pop'd.

Not sure I like this behavior very much. It might be better to be able to temporarily disable a credential. Then check on a regular base if the credential can be enabled again.

Also, the credential 'status' is kept local to the class. Whereas we have many workers. It might be useful to think about a solution for this state to be shared among workers (using memcache, etcd or similar). Not to be done right now in this diff, obviously :D

swh/lister/github/lister.py
45

No need for this function to be inlined here (does not capture the closure or similar). So no need to instantiate a new function object on each call of this method...

This revision now requires changes to proceed.Jan 9 2020, 10:54 AM

Am I reading this wrong, or is the new behavior implemented in this diff not really tested?

Indeed.

I wanted to have some kind of discussion about how to implement somewhat properly what's described in T2099 first.
(prior to fight the current tests to test this i mean)


context:

Currently, the github listers are hitting rate limit.
Other listers are waiting on the lister github tasks to finish (which kinda never do because it's stuck in a "infinite" waiting time loop...).
By adding this behavior, that could make them finally fail (when no more credentials).

This implicitely lowered in priority as we unstuck the other listers.
First, by stopping one worker (changing its setup so it stops listing github).
Then by completely removing the github listing tasks (as the consensus was, we need more accounts, which afaiu @olasd is doing)

swh/lister/core/lister_transports.py
145

Why this 'reset' does remove a token?

yes, remove_authentication_token sounds better now.

Remove from what?

From the current setup initially loaded (from the configuration file) at the task startup.

Why/when is this useful or needed?

That'd be the task linked in the description T2099.

swh/lister/github/lister.py
45

do you mean this can be set as class method directly?

(my main concern was to avoid repetition of this code in the conditional below).

ardumont abandoned this revision.Jan 15 2020, 3:01 PM