Some listers (e.g. cgit, launchpad, so far ...) seem to write their visit listing at the end of their listing computations.
Make them flush their current state at regular interval sounds like a better behavior.
Some listers (e.g. cgit, launchpad, so far ...) seem to write their visit listing at the end of their listing computations.
Make them flush their current state at regular interval sounds like a better behavior.
rDLS Listers | |||
D4967 | rDLS5aa7c8f2b21f launchpad: Remove call to dataclasses.asdict on lister state | ||
D4966 | rDLS46f5a50099f9 launchpad: Prevent error due to origin listed twice | ||
D4965 | rDLS0ad37740d9d7 pattern: Make lister flush regularly origins to scheduler |
Make them flush their current state at regular interval sounds like a better behavior.
case in point.
The launchpad lister just failed after 6min of running and nothing got written in the listed_origins [1] with the error [2]
[1]
swhworker@worker0:~$ time SWH_CONFIG_FILENAME=/etc/softwareheritage/lister.yml swh lister run --lister launchpad incremental=True Traceback (most recent call last): File "/usr/lib/python3/dist-packages/swh/lister/pattern.py", line 123, in run full_stats.origins += self.send_origins(origins) File "/usr/lib/python3/dist-packages/swh/lister/pattern.py", line 224, in send_origins ret = self.scheduler.record_listed_origins(origins) File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 181, in meth_ return self.post(meth._endpoint_path, post_data) File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 278, in post return self._decode_response(response) File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 354, in _decode_response self.raise_for_status(response) File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 344, in raise_for_status raise exception from None swh.core.api.RemoteException: <RemoteException 500 CardinalityViolation: ['ON CONFLICT DO UPDATE command cannot affect row a second time\nHINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.\n']> ... y = func(*args) TypeError: __init__() missing 1 required positional argument: 'tz_string' real 6m17.047s user 0m22.060s sys 0m0.644s
[2] https://sentry.softwareheritage.org/share/issue/c6e26fcfe55348c6a7f4d5917f266c10/
I thought having fixed that bug (also encountered when developing the lister).
This means that a repository appears twice when listing.
Currently testing the lister locally.
I thought having fixed that bug (also encountered when developing the lister).
The bug listed here was merely to demo the case (not it per say ;)