Page MenuHomeSoftware Heritage

Deal with GitHub removing support for git:// URLs
Open, HighPublic

Description

GitHub just announced they will remove support for the git:// protocol

Unfortunately, we depend on it to clone efficiently.

So we need to either:

  • Ask GitHub to keep supporting it (at least for us)
  • Figure out an alternative

Event Timeline

vlorentz triaged this task as High priority.Wed, Sep 1, 9:11 PM
vlorentz created this task.

The dulwich HTTP(s) support is implemented on top of urllib(3?).

As such, on top of it being unable to enforce a limit for pack files, it's not able to have the bidirectional conversation with the git server that's needed for the negotiation of the objects that need to be fetched.

I can see a few alternatives to using git:// over tcp:

  • Give our swh bot accounts SSH keys, and use that to clone from GitHub over ssh.
  • Improve the Dulwich git-over-http(s) implementation.
  • Switch over to another implementation of the git remote protocol (which one?) to retrieve packfiles.

Obviously last two options would benefit all origins, not just GitHub.

In T3544#69746, @olasd wrote:

I can see a few alternatives to using git:// over tcp:

  • Give our swh bot accounts SSH keys, and use that to clone from GitHub over ssh.

Unless I'm missing something (complicated key management from an operations perspective, perhaps?) this one seems to be both easier to implement and less risky (e.g., in terms of unforeseen impact on the implementation side). And hence the way to go.

  • Improve the Dulwich git-over-http(s) implementation.

If this is what we want, we should probably consider contracting dulwich maintainer(s) directly, rather then doing it ourselves.