In the dogfooding category, it would be nice that we can ingest our self-hosted Git repositories without relying on the fact that they are also on GitHub :-)
Unfortunately, trying to run the git loader on, e.g., the hello world repo, fails like this:
2018-09-14 17:00:54,400 25707 Creating git origin for https://forge.softwareheritage.org/source/helloworld.git 2018-09-14 17:00:54,404 25707 Starting new HTTP connection (1): localhost 2018-09-14 17:00:54,408 25707 http://localhost:5002 "POST /origin/add HTTP/1.1" 200 1 2018-09-14 17:00:54,408 25707 Done creating git origin for https://forge.softwareheritage.org/source/helloworld.git 2018-09-14 17:00:54,409 25707 Creating origin_visit for origin 2 at time 2018-09-14 15:00:54.400801+00:00 2018-09-14 17:00:54,411 25707 Resetting dropped connection: localhost 2018-09-14 17:00:54,415 25707 http://localhost:5002 "POST /origin/visit/add HTTP/1.1" 200 16 2018-09-14 17:00:54,415 25707 Done Creating origin_visit for origin 2 at time 2018-09-14 15:00:54.400801+00:00 2018-09-14 17:00:54,417 25707 Resetting dropped connection: localhost 2018-09-14 17:00:54,420 25707 http://localhost:5002 "POST /fetch_history/start HTTP/1.1" 200 1 2018-09-14 17:00:54,422 25707 Resetting dropped connection: localhost 2018-09-14 17:00:54,425 25707 http://localhost:5002 "POST /snapshot/latest HTTP/1.1" 200 1 2018-09-14 17:00:54,427 25707 Resetting dropped connection: localhost 2018-09-14 17:00:54,431 25707 http://localhost:5002 "POST /snapshot/latest HTTP/1.1" 200 1 2018-09-14 17:00:54,432 25707 Starting new HTTPS connection (1): forge.softwareheritage.org 2018-09-14 17:00:54,760 25707 https://forge.softwareheritage.org:443 "GET /source/helloworld.git/info/refs?service=git-upload-pack HTTP/1.1" 200 None 2018-09-14 17:00:54,762 25707 Loading failure, updating to `partial` status Traceback (most recent call last): File "/usr/lib/python3/dist-packages/dulwich/protocol.py", line 200, in read_pkt_line sizestr = read(4) File "/usr/lib/python3.6/gzip.py", line 276, in read return self._buffer.read(size) File "/usr/lib/python3.6/_compression.py", line 68, in readinto data = self.read(len(byte_view)) File "/usr/lib/python3.6/gzip.py", line 463, in read if not self._read_gzip_header(): File "/usr/lib/python3.6/gzip.py", line 411, in _read_gzip_header raise OSError('Not a gzipped file (%r)' % magic) OSError: Not a gzipped file (b'00') During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 889, in load more_data_to_fetch = self.fetch_data() File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-loader-git/swh/loader/git/updater.py", line 260, in fetch_data do_progress) File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-loader-git/swh/loader/git/updater.py", line 202, in fetch_pack_from_origin progress=do_activity) File "/usr/lib/python3/dist-packages/dulwich/client.py", line 1544, in fetch_pack b"git-upload-pack", url) File "/usr/lib/python3/dist-packages/dulwich/client.py", line 1449, in _discover_references [pkt] = list(proto.read_pkt_seq()) File "/usr/lib/python3/dist-packages/dulwich/protocol.py", line 254, in read_pkt_seq pkt = self.read_pkt_line() File "/usr/lib/python3/dist-packages/dulwich/protocol.py", line 212, in read_pkt_line raise GitProtocolError(e) dulwich.errors.GitProtocolError: Not a gzipped file (b'00') 2018-09-14 17:00:54,771 25707 Resetting dropped connection: localhost 2018-09-14 17:00:54,779 25707 http://localhost:5002 "POST /fetch_history/end HTTP/1.1" 200 1 2018-09-14 17:00:54,781 25707 Updating origin_visit for origin 2 with status partial 2018-09-14 17:00:54,785 25707 Resetting dropped connection: localhost 2018-09-14 17:00:54,793 25707 http://localhost:5002 "POST /origin/visit/update HTTP/1.1" 200 1 2018-09-14 17:00:54,795 25707 Done updating origin_visit for origin 2 with status partial
git clone on the same URL works just fine. I suspect this affects all our repos hosted on forge.softwareheritage.org, but haven't tried.