Page MenuHomeSoftware Heritage

Git origin without smart transfer protocol support cannot be loaded
Closed, MigratedEdits Locked

Description

Git supports two transfer protocols to exchange data between two repositories: the dumb protocol and the smart protocol.

Nowadays, the smart protocol is a common method of transferring data because it is more efficient.
The smart protocol support can be checked by inspecting the Content-Type HTTP header when
sending a GET request to the /info/refs endpoint of the git server. In that case, the content type
starts with application/x-git-.

antoine@guggenheim:/tmp$ curl -i https://forge.softwareheritage.org/source/swh-loader-git/info/refs?service=git-upload-pack
HTTP/1.1 200 OK
Date: Thu, 09 Jul 2020 17:54:57 GMT
Server: Apache
X-Frame-Options: Deny
Strict-Transport-Security: max-age=0; includeSubdomains; preload
Content-Security-Policy: default-src 'self' https://forge.softwareheritage.org; img-src 'self' https://forge.softwareheritage.org data:; style-src 'self' https://forge.softwareheritage.org 'unsafe-inline'; script-src 'self' https://forge.softwareheritage.org; connect-src 'self'; frame-src 'self'; frame-ancestors 'none'; object-src 'none'; form-action 'self'; base-uri 'none'
Referrer-Policy: no-referrer
Expires: Fri, 01 Jan 1980 00:00:00 GMT
Pragma: no-cache
Cache-Control: no-cache, max-age=0, must-revalidate
Strict-Transport-Security: max-age=15768000
Transfer-Encoding: chunked
Content-Type: application/x-git-upload-pack-advertisement

Warning: Binary output can mess up your terminal. Use "--output -" to tell 
Warning: curl to output it to your terminal anyway, or consider "--output 
Warning: <FILE>" to save to a file.

Nevertheless, some git servers do no seem to support the smart protocol and a fallback
to the dumb protocol will be used by the official git client in that case. This is notably
the case on numerous cgit instances in the wild (for instance here or here ).

antoine@guggenheim:/tmp$ curl -i https://git.systemreboot.net/guile-xapian/info/refs?service=git-upload-pack
HTTP/1.1 200 OK
Server: nginx/1.18.0
Date: Thu, 09 Jul 2020 18:04:56 GMT
Content-Type: text/plain; charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
Content-Disposition: inline; filename="info/refs"
Last-Modified: Thu, 09 Jul 2020 18:04:56 GMT
Expires: Thu, 09 Jul 2020 18:09:56 GMT
Strict-Transport-Security: max-age=63072000; includeSubdomains; preload

612317c1335fc5b9fe64eb09e3dc2bb508d100c6        refs/heads/master
59b340eb869c4966c5eb92309df6293ceb4db6ea        refs/tags/v0.1.0
612317c1335fc5b9fe64eb09e3dc2bb508d100c6        refs/tags/v0.1.0^{}

The git loader relies on the dulwich package to fetch git pack data but unfortunately only the smart protocol
is supported for that operation in current dulwich implementation.

Related sentry bug report: https://sentry.softwareheritage.org/organizations/swh/issues/2343/events/latest/?project=8

As a workaround in order to be able to load git origins without smart transfer protocol support, we could build
the pack file to fetch client side using dulwich pack API and getting git objects through the dumb transfer protocol.