All errrors reported by the git loader of type psycopg2.InternalError: current transaction is aborted, commands ignored until end of transaction block [1] correspond to the processing of malformed dates.
This is usually due to a revision whose author or commit date is located far in the future, see for instance:
- https://github.com/cristiansteib/unAventon/commit/1d05195322ec802238fb7c4608b8db614b4d75c7
- https://github.com/archlinuxarm/PKGBUILDs/commit/509419e3280b66026c55787ccc8ee97e53ca690f
- https://github.com/ska-sa/PySPEAD/commit/0ee7c7d41b57b471af00b4e2869cede57706b9fd
- https://github.com/samthiriot/openmole/commit/b558e2f43067c6471ac656fcadff4559958abbcf
This results in an invalid computed timezone offset whose value overflows the smallint postgres type,
resulting in the following exception being thrown in swh-storage:
Traceback (most recent call last): File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner self.run() File "/usr/lib/python3.5/threading.py", line 862, in run self._target(*self._args, **self._kwargs) File "/home/antoine/swh/swh-environment/swh-storage/swh/storage/db.py", line 201, in writer tblname, ', '.join(columns)), f) psycopg2.DataError: ERREUR: la valeur « 24193125 » est en dehors des limites pour le type smallint CONTEXT: COPY tmp_revision, ligne 19448, colonne date_offset : « 24193125 »
We should handle these corner cases. The simplest solution would be to check if the computed timezone offset lies in the adequate bounds [UTC−14:00, UTC+14:00]
and set it to 0 if not. This could be handled directly in swh-storage [2] in case other loaders encounter a similar issue.
[2] https://forge.softwareheritage.org/source/swh-storage/browse/master/swh/storage/converters.py$125