stuff related to https://forge.softwareheritage.org/diffusion/DLDG/
Tue, Dec 4
Tue, Nov 27
Fri, Nov 16
Wed, Nov 14
Errors of type ValueError: year is out of range  are related to commit dates that can not be represented using standard datetime.datetime Python object (minyear = 0, maxyear = 9999).
See for instance:
All the errors of type psycopg2.IntegrityError: duplicate key value violates unique constraint "content_pkey"  are all about sha1 collisions, mainly from repositories
testing the attack uncovered by SHAttered 
Indeed, you're right the timezone offset is used to compute a revision identifier so even if its value is incorrect it should be stored anyway.
The simplest solution would be to check if the computed timezone offset lies in the adequate bounds [UTC−14:00, UTC+14:00] and set it to 0 if not.
Nov 9 2018
Resolved by D622.
Oct 30 2018
Oct 20 2018
Heads up, i amended the description.
I did it that way to not destroy the analysis thread (adding a bloated comment in the middle of it seemed wrong to me).
Oct 19 2018
"dulwich.errors.ObjectFormatException: invalid literal for int() with base 10:" is https://github.com/dulwich/dulwich/pull/664
Thanks for the analysis!
Oct 11 2018
Oct 1 2018
Sep 27 2018
Sep 21 2018
That's deployed now.
Sep 20 2018
Turns out that eadefcb15384ac0c68f3ba664f9607e1f588257d already truncates the history to a depth of 0 *cough*. It just needs deploying.
Sep 17 2018
Sep 14 2018
Tracked down the dulwich bug. Pull Request : https://github.com/dulwich/dulwich/pull/659
export GIT_TRACE_PACKET=1 export GIT_TRACE=1 export GIT_CURL_VERBOSE=1
I've compared traces from git cloning on our forge and on other repositories with actual git clone.
Jun 26 2018
--- content_size_limit: 1000000 save_data: True save_data_path: /home/ndandrim/.cache/swh/packfiles storage: cls: remote args: url: http://localhost:5002/
Jun 13 2018
In my opinion, the release notes shouldn't be stored in the release objects, for the following reasons:
- they are dependent of the origin (a clone of this exact same repository on another git hosting platform won't have that information)
- they aren't part of the data used to compute the release identifier
- they can be modified after the fact