Git supports two [[ https://git-scm.com/book/en/v2/Git-Internals-Transfer-Protocols | transfer protocols ]] to exchange data between two repositories: the **dumb** protocol and the **smart** protocol.
Nowadays, the smart protocol is a common method of transferring data because it is more efficient.
The smart protocol support can be checked by inspecting the `Content-Type` HTTP header when
sending a `GET` request to the `/info/refs` endpoint of the git server. In that case, the content type
starts with `application/x-git-`.
```
antoine@guggenheim:/tmp$ curl -i https://forge.softwareheritage.org/source/swh-loader-git/info/refs?service=git-upload-pack
HTTP/1.1 200 OK
Date: Thu, 09 Jul 2020 17:54:57 GMT
Server: Apache
X-Frame-Options: Deny
Strict-Transport-Security: max-age=0; includeSubdomains; preload
Content-Security-Policy: default-src 'self' https://forge.softwareheritage.org; img-src 'self' https://forge.softwareheritage.org data:; style-src 'self' https://forge.softwareheritage.org 'unsafe-inline'; script-src 'self' https://forge.softwareheritage.org; connect-src 'self'; frame-src 'self'; frame-ancestors 'none'; object-src 'none'; form-action 'self'; base-uri 'none'
Referrer-Policy: no-referrer
Expires: Fri, 01 Jan 1980 00:00:00 GMT
Pragma: no-cache
Cache-Control: no-cache, max-age=0, must-revalidate
Strict-Transport-Security: max-age=15768000
Transfer-Encoding: chunked
Content-Type: application/x-git-upload-pack-advertisement
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.
```
Nevertheless, some git servers do no seem to support the smart protocol and a fallback
to the dumb protocol will be used by the official git client in that case. This is notably
the case on numerous cgit instances in the wild (for instance [[ https://git.ti.com/cgit/ | here ]] or [[ hdiff.luite.com/cgit/ | here ]] ).
```
antoine@guggenheim:/tmp$ curl -i https://git.systemreboot.net/guile-xapian/info/refs?service=git-upload-pack
HTTP/1.1 200 OK
Server: nginx/1.18.0
Date: Thu, 09 Jul 2020 18:04:56 GMT
Content-Type: text/plain; charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
Content-Disposition: inline; filename="info/refs"
Last-Modified: Thu, 09 Jul 2020 18:04:56 GMT
Expires: Thu, 09 Jul 2020 18:09:56 GMT
Strict-Transport-Security: max-age=63072000; includeSubdomains; preload
612317c1335fc5b9fe64eb09e3dc2bb508d100c6 refs/heads/master
59b340eb869c4966c5eb92309df6293ceb4db6ea refs/tags/v0.1.0
612317c1335fc5b9fe64eb09e3dc2bb508d100c6 refs/tags/v0.1.0^{}
```
The git loader relies on the [[ https://github.com/dulwich/dulwich/ | dulwich ]] package to fetch git pack data but unfortunately only the smart protocol
is supported for that operation in [[ https://github.com/dulwich/dulwich/blob/9b12d180b3dd2e6e19c290cbe4c8fa1635eb3e54/dulwich/client.py#L1793-L1794 | current dulwich implementation ]].
Related sentry bug report: https://sentry.softwareheritage.org/organizations/swh/issues/2343/events/latest/?project=8
As a workaround in order to be able to load git origins without smart transfer protocol support, we could build
the pack file to fetch client side using [[ https://github.com/dulwich/dulwich/blob/master/dulwich/pack.py | dulwich pack API ]] and getting git objects through the dumb transfer protocol.