The GitHub API allows to inspect when a repo has been last modified, see `updated_at`/`pushed_at` fields in this [[ https://api.github.com/repos/mojombo/grit | example ]].
Given how significant GitHub is in our archive coverage it makes sense to add a forge-specific optimization that skip loading repos for which those timestamps are older than our last visit of the corresponding origins.
(Note: I'm not exactly sure what the difference among the two fields are; I'm assuming `pushed_at` is for `git push` and `updated_at` for metadata changes. But I think even the most conservative approach, skip only if //both// fields are older than our last visit would be a good start.)
Assuming that doing an API call at the loader level is faster than actually trying to load the repo (which seems obvious to me, but it's not like I have actually benchmarked it *g*), this optimization should help a lot in clearing our backlog of repos to re-visit, for all GitHub repos that haven't changed.
I'm not sure where this forge-specific optimization belongs, but it's something we're probably going to extend in the future too, e.g., for GitLab.