stuff related to https://forge.softwareheritage.org/diffusion/DLDG/
Jan 22 2020
I agree that this may be a useful optimization for some upstreams where getting the state of the remote repository is expensive.
Jan 21 2020
Nov 19 2019
This has been fixed by cb42fea77070
Nov 15 2019
Nov 5 2019
Note that this doesn't solve the question of pulling release notes from e.g. GitHub release pages, which is something that would need to be done by some other component (T17 comes to mind).
Oct 1 2019
Sep 30 2019
To ease the analysis, here is an aggregate of the 09/2019 latest failures:
New dashboards with latest errors as of 09/2019 
Sep 10 2019
I've backported dulwich 0.19.13-1 to our stretch repo, upgraded all workers and they're restarting.
Sep 7 2019
And nice work on the investigation and the fix within dulwich ;)
Sep 6 2019
May 25 2019
This is done, I've forked off the part about consistently documenting configuration options to T1758.
Feb 5 2019
That's a fairly large repo (as seen with how the content bundles get spread out to limit their size). It looks like it has some large directories (e.g. the .bugs directory looks like it has a lot of entries) so I'm not too surprised.
Jan 21 2019
Errors of type dulwich.errors.NotGitRepository  are likely related to a bug in dulwich regarding redirected repository urls not correctly handled.
A pull request  has been submitted to fix that issue.
Dec 17 2018
Up to 85% now.
Dec 4 2018
Nov 27 2018
Nov 16 2018
Nov 14 2018
Errors of type ValueError: year is out of range  are related to commit dates that can not be represented using standard datetime.datetime Python object (minyear = 0, maxyear = 9999).
See for instance:
All the errors of type psycopg2.IntegrityError: duplicate key value violates unique constraint "content_pkey"  are all about sha1 collisions, mainly from repositories
testing the attack uncovered by SHAttered 
Indeed, you're right the timezone offset is used to compute a revision identifier so even if its value is incorrect it should be stored anyway.
The simplest solution would be to check if the computed timezone offset lies in the adequate bounds [UTC−14:00, UTC+14:00] and set it to 0 if not.