- create method in lister core
- add different providers by listers
- add method into each lister Note: this task is for the abstract method in lister-core, subtasks should be created for each lister
|Open||None||T833 When listing an origin, add origin level metadata to storage|
|Open||vlorentz||T1344 Write specs about metadata workflow|
|Resolved||vlorentz||T1738 Define and specify extrinsic origin metadata|
|Open||vlorentz||T1739 Define an architecture to fetch extrinsic metadata outside listers|
|Resolved||vlorentz||T1737 Define and specify metadata providers|
|Open||vlorentz||T1747 Review APIs to get metadata from supported origins|
|Resolved||vlorentz||T1748 Review which extrinsic metadata we want to fetch and archive|
- the API endpoints used by the github and bitbucket listers does not show extrinsic metadata, so that option is out
- sending a request for each repository would need ~2 to 3 years for a full pass over github. That's with our current infrastructure, so it's not a hard limit.
Where is the bottleneck for this? API rate limit or what? We already use multiple tokens for listing github, can't we just do the same here and speed up (almost) arbitrarily a complete pass?
@zack Yes, rate limit. And I determined this based on our current listing rate (@olasd said 2 to 3 days to fully list GitHub with 500 repos per request; so an API call for each repo would take a total of 500 times 2 or 3 days).
But reading etalab's code gave me an idea: they send an API call per github organization/user (or more, if they have a lot of repos), and get multiple repo metadata at once; we could do that too.