Page MenuHomeSoftware Heritage

(Incremental) loading of large git repositories fails with an HTTP timeout
Closed, ResolvedPublic

Description

In production, doing an incremental load of some large git repositories fails with an HTTP timeout in git-upload-pack.

This problem is two-fold:

  • the git protocol expects an interactive walk of the history to find common ancestors between client and server
  • dulwich's implementation of this interactive walk is not bidirectional, and I think we end up walking the full history of all branches *while* we're issuing the GET request.

This is especially noticeable on repositories with lots of branches such as https://github.com/mozilla/gecko-dev/

We can fix this by truncating the depth we use to walk the repository history before fetching the repo

Event Timeline

olasd created this task.Sep 17 2018, 1:58 PM
olasd triaged this task as High priority.
olasd claimed this task.
olasd added a comment.Sep 20 2018, 5:21 PM

Turns out that eadefcb15384ac0c68f3ba664f9607e1f588257d already truncates the history to a depth of 0 *cough*. It just needs deploying.

olasd closed this task as Resolved.Sep 21 2018, 2:37 PM

That's deployed now.