stuff related to https://forge.softwareheritage.org/diffusion/DLDG/
Feb 5 2019
That's a fairly large repo (as seen with how the content bundles get spread out to limit their size). It looks like it has some large directories (e.g. the .bugs directory looks like it has a lot of entries) so I'm not too surprised.
Jan 21 2019
Errors of type dulwich.errors.NotGitRepository  are likely related to a bug in dulwich regarding redirected repository urls not correctly handled.
A pull request  has been submitted to fix that issue.
Dec 17 2018
Up to 85% now.
Dec 4 2018
Nov 27 2018
Nov 16 2018
Nov 14 2018
Errors of type ValueError: year is out of range  are related to commit dates that can not be represented using standard datetime.datetime Python object (minyear = 0, maxyear = 9999).
See for instance:
All the errors of type psycopg2.IntegrityError: duplicate key value violates unique constraint "content_pkey"  are all about sha1 collisions, mainly from repositories
testing the attack uncovered by SHAttered 
Indeed, you're right the timezone offset is used to compute a revision identifier so even if its value is incorrect it should be stored anyway.
The simplest solution would be to check if the computed timezone offset lies in the adequate bounds [UTC−14:00, UTC+14:00] and set it to 0 if not.
Nov 9 2018
Resolved by D622.
Oct 30 2018
Oct 20 2018
Heads up, i amended the description.
I did it that way to not destroy the analysis thread (adding a bloated comment in the middle of it seemed wrong to me).
Oct 19 2018
"dulwich.errors.ObjectFormatException: invalid literal for int() with base 10:" is https://github.com/dulwich/dulwich/pull/664
Thanks for the analysis!
Oct 11 2018
Oct 1 2018
Sep 27 2018
Sep 21 2018
That's deployed now.
Sep 20 2018
Turns out that eadefcb15384ac0c68f3ba664f9607e1f588257d already truncates the history to a depth of 0 *cough*. It just needs deploying.
Sep 17 2018
Sep 14 2018
Tracked down the dulwich bug. Pull Request : https://github.com/dulwich/dulwich/pull/659
export GIT_TRACE_PACKET=1 export GIT_TRACE=1 export GIT_CURL_VERBOSE=1
I've compared traces from git cloning on our forge and on other repositories with actual git clone.
Jun 26 2018
--- content_size_limit: 1000000 save_data: True save_data_path: /home/ndandrim/.cache/swh/packfiles storage: cls: remote args: url: http://localhost:5002/