This will allow updating metadata of repositories known to be inactive in terms of source-code, but may have changes in metadata (eg. watchers, stargazers, ...)
|Migrated||gitlab-migration||T2201 Indexing / mining|
|Migrated||gitlab-migration||T2202 Collect extrinsic metadata|
|Migrated||gitlab-migration||T4252 Schedule recurring fetches of origin metadata|
|Migrated||gitlab-migration||T4394 Add support for running metadata fetchers without a VCS/package loaders|
We decided to add recurring fetches, so it will take care both of backfilling now, and visiting from time to time in the future. We're going to assume 3 months for now, as it seems reasonable to not exhaust rate limits.
- add metrics for token usage, as we need that to make decisions
- add a swh-scheduler component (CLI called from a cron?) that reads from the DB and fills a Kafka topic
- a journal client that reads from the topic and fetches the metadata
For now, we are only going to use a single topic, which means the whole thing will block when we exhaust a token's rate-limit; but the aim is to not hit rate-limits at all. We can revisit the question later, anyway.