What we can collect is a subset of what we can see here: https://api.github.com/repos/SoftwareHeritage/swh-core
The way we'll collect it depends on what info we want; so let's try to list it exhaustively:
* owner avatar/URL ?
* description
* whether it's a fork, and of what
* whether it's a mirror, and of what
* created_at / updated_at
* pushed_at
* homepage
* "topics" (don't seem to be officially in the REST API yet, preview only: https://docs.github.com/en/rest/reference/repos#get-all-repository-topics-preview-notices and https://docs.github.com/en/rest/overview/api-previews#repository-topics )
* stargazers_count / watchers_count
* list of stargazers / watchers ?
* forks count
* list of forks?
* license/language? (GH extracts it from the intrinsic metadata we collect too, so probably not very useful)
* assets? this should probably be done by a specific loader though; it's closer to a package manager than to metadata
* release notes that aren't on git tags?
Anything else?