Git supports two transfer protocols to exchange data between two repositories: the dumb protocol and the smart protocol.
Nowadays, the smart protocol is a common method of transferring data because it is more efficient.
The smart protocol support can be checked by inspecting the Content-Type HTTP header when
sending a GET request to the /info/refs endpoint of the git server. In that case, the content type
starts with application/x-git-.
antoine@guggenheim:/tmp$ curl -i https://forge.softwareheritage.org/source/swh-loader-git/info/refs?service=git-upload-pack HTTP/1.1 200 OK Date: Thu, 09 Jul 2020 17:54:57 GMT Server: Apache X-Frame-Options: Deny Strict-Transport-Security: max-age=0; includeSubdomains; preload Content-Security-Policy: default-src 'self' https://forge.softwareheritage.org; img-src 'self' https://forge.softwareheritage.org data:; style-src 'self' https://forge.softwareheritage.org 'unsafe-inline'; script-src 'self' https://forge.softwareheritage.org; connect-src 'self'; frame-src 'self'; frame-ancestors 'none'; object-src 'none'; form-action 'self'; base-uri 'none' Referrer-Policy: no-referrer Expires: Fri, 01 Jan 1980 00:00:00 GMT Pragma: no-cache Cache-Control: no-cache, max-age=0, must-revalidate Strict-Transport-Security: max-age=15768000 Transfer-Encoding: chunked Content-Type: application/x-git-upload-pack-advertisement Warning: Binary output can mess up your terminal. Use "--output -" to tell Warning: curl to output it to your terminal anyway, or consider "--output Warning: <FILE>" to save to a file.
Nevertheless, some git servers do no seem to support the smart protocol and a fallback
to the dumb protocol will be used by the official git client in that case. This is notably
the case on numerous cgit instances in the wild (for instance here or here ).
antoine@guggenheim:/tmp$ curl -i https://git.systemreboot.net/guile-xapian/info/refs?service=git-upload-pack HTTP/1.1 200 OK Server: nginx/1.18.0 Date: Thu, 09 Jul 2020 18:04:56 GMT Content-Type: text/plain; charset=UTF-8 Transfer-Encoding: chunked Connection: keep-alive Content-Disposition: inline; filename="info/refs" Last-Modified: Thu, 09 Jul 2020 18:04:56 GMT Expires: Thu, 09 Jul 2020 18:09:56 GMT Strict-Transport-Security: max-age=63072000; includeSubdomains; preload 612317c1335fc5b9fe64eb09e3dc2bb508d100c6 refs/heads/master 59b340eb869c4966c5eb92309df6293ceb4db6ea refs/tags/v0.1.0 612317c1335fc5b9fe64eb09e3dc2bb508d100c6 refs/tags/v0.1.0^{}
The git loader relies on the dulwich package to fetch git pack data but unfortunately only the smart protocol
is supported for that operation in current dulwich implementation.
Related sentry bug report: https://sentry.softwareheritage.org/organizations/swh/issues/2343/events/latest/?project=8
As a workaround in order to be able to load git origins without smart transfer protocol support, we could build
the pack file to fetch client side using dulwich pack API and getting git objects through the dumb transfer protocol.