Page MenuHomeSoftware Heritage

Add support for running metadata fetchers without a VCS/package loaders
Closed, MigratedEdits Locked

Description

This will allow updating metadata of repositories known to be inactive in terms of source-code, but may have changes in metadata (eg. watchers, stargazers, ...)

Event Timeline

vlorentz triaged this task as Normal priority.Jul 18 2022, 10:33 AM
vlorentz created this task.

We decided to add recurring fetches, so it will take care both of backfilling now, and visiting from time to time in the future. We're going to assume 3 months for now, as it seems reasonable to not exhaust rate limits.

The plan:

  1. add metrics for token usage, as we need that to make decisions
  2. add a swh-scheduler component (CLI called from a cron?) that reads from the DB and fills a Kafka topic
  3. a journal client that reads from the topic and fetches the metadata

For now, we are only going to use a single topic, which means the whole thing will block when we exhaust a token's rate-limit; but the aim is to not hit rate-limits at all. We can revisit the question later, anyway.