Page MenuHomeSoftware Heritage

next gen lister: Make lister flush their visit state regularly
Closed, ResolvedPublic

Description

Some listers (e.g. cgit, launchpad, so far ...) seem to write their visit listing at the end of their listing computations.

Make them flush their current state at regular interval sounds like a better behavior.

Event Timeline

ardumont triaged this task as Normal priority.Jan 28 2021, 4:05 PM
ardumont created this task.

Make them flush their current state at regular interval sounds like a better behavior.

case in point.

The launchpad lister just failed after 6min of running and nothing got written in the listed_origins [1] with the error [2]

[1]

swhworker@worker0:~$ time SWH_CONFIG_FILENAME=/etc/softwareheritage/lister.yml swh lister run --lister launchpad incremental=True
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/swh/lister/pattern.py", line 123, in run
    full_stats.origins += self.send_origins(origins)
  File "/usr/lib/python3/dist-packages/swh/lister/pattern.py", line 224, in send_origins
    ret = self.scheduler.record_listed_origins(origins)
  File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 181, in meth_
    return self.post(meth._endpoint_path, post_data)
  File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 278, in post
    return self._decode_response(response)
  File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 354, in _decode_response
    self.raise_for_status(response)
  File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 344, in raise_for_status
    raise exception from None
swh.core.api.RemoteException: <RemoteException 500 CardinalityViolation: ['ON CONFLICT DO UPDATE command cannot affect row a second time\nHINT:  Ensure that no rows proposed for insertion within the same command have duplicate constrained
values.\n']>

...
    y = func(*args)
TypeError: __init__() missing 1 required positional argument: 'tz_string'

real    6m17.047s
user    0m22.060s
sys     0m0.644s

[2] https://sentry.softwareheritage.org/share/issue/c6e26fcfe55348c6a7f4d5917f266c10/

I thought having fixed that bug (also encountered when developing the lister).

This means that a repository appears twice when listing.

Currently testing the lister locally.

I thought having fixed that bug (also encountered when developing the lister).

The bug listed here was merely to demo the case (not it per say ;)

ardumont claimed this task.

Landed and deployed in staging.