As part of the maven lister, it's been put into attention that some urls can be listed
without being the main canonical urls. This can result in origins duplication for no
good reason.
So let's reuse some existing url canonicalization code (for gh origins) in listers
and reuse when possible. That code should exist in swh-web and be refactored out into
swh.core then be reused both in swh-web and listers (starting with the maven one,
possibly nixguix, and packagist listers can be done later as well).
Plan:
- D7836: Compute canonical gh urls in an exposed library function in swh.core
- D7840: Refactor GitHubSession request management out of swh.lister in swh.core
- Release [2.6.0)
- Unstuck debian build if problem (new deps)
- D7870: Use GitHubSession to make the canonical computation deal with rate limit
- Release (2.7.0)
- D7877: Refactor swh.lister to reuse the code moved in swh.core
- D7880: Add missing canonical case in swh.core
- Release (2.8.0)
- D7879: (Goal) Adapt maven lister to list canonical gh urls if any
- D7946: Extra work for exotic github urls (deployed on staging)
Extra plan got extracted out of this task [1]
[1] T4279
Note: gh refers to GitHub