Page MenuHomeSoftware Heritage

(Incremental) loading of large git repositories fails with an HTTP timeout
Closed, MigratedEdits Locked

Description

In production, doing an incremental load of some large git repositories fails with an HTTP timeout in git-upload-pack.

This problem is two-fold:

  • the git protocol expects an interactive walk of the history to find common ancestors between client and server
  • dulwich's implementation of this interactive walk is not bidirectional, and I think we end up walking the full history of all branches *while* we're issuing the GET request.

This is especially noticeable on repositories with lots of branches such as https://github.com/mozilla/gecko-dev/

We can fix this by truncating the depth we use to walk the repository history before fetching the repo

Event Timeline

olasd triaged this task as High priority.
olasd created this task.

Turns out that eadefcb15384ac0c68f3ba664f9607e1f588257d already truncates the history to a depth of 0 *cough*. It just needs deploying.

That's deployed now.