What we can collect is a subset of what we can see here: https://api.github.com/repos/SoftwareHeritage/swh-core
The way we'll collect it depends on what info we want; so let's try to list it exhaustively:
* owner avatar/URL ?
* description
* whether it's a fork, and of what
* whether it's a mirror, and of what
* created_at / updated_at
* pushed_at
* homepage
* stargazers_count / watchers_count
* lister of stargazers / watchers ?
* language ?
* forks count
* list of forks?
* license? (GH extracts it from the intrinsic metadata we collect too, so probably not very useful)
* assets? this should probably be done by a specific loader though; it's closer to a package manager than to metadata
* release notes that aren't on git tags?
Anything else?