Page MenuHomeSoftware Heritage

deposit: loader instanciation is failing with an error "unexpected keyword argument 'extraction_dir'"
Closed, MigratedEdits Locked

Description

The icinga probe is red and an error is raised in sentry : https://sentry.softwareheritage.org/share/issue/fa106cb39d304c98a5a784871b0dcc3c/

Feb 19 05:33:16 worker01 python3[1755971]: [2021-02-19 05:33:16,548: ERROR/ForkPoolWorker-1] Task swh.loader.package.deposit.tasks.LoadDeposit[86e6a277-c650-4a6d-988e-e24035bc7289] raised unexpected: TypeError("__init__() got an unexpected keyword argument 'extraction_dir'")
                                           Traceback (most recent call last):
                                             File "/usr/lib/python3/dist-packages/celery/app/trace.py", line 385, in trace_task
                                               R = retval = fun(*args, **kwargs)
                                             File "/usr/lib/python3/dist-packages/swh/scheduler/task.py", line 51, in __call__
                                               result = super().__call__(*args, **kwargs)
                                             File "/usr/lib/python3/dist-packages/celery/app/trace.py", line 650, in __protected_call__
                                               return self.run(*args, **kwargs)
                                             File "/usr/lib/python3/dist-packages/sentry_sdk/integrations/celery.py", line 161, in _inner
                                               reraise(*exc_info)
                                             File "/usr/lib/python3/dist-packages/sentry_sdk/_compat.py", line 57, in reraise
                                               raise value
                                             File "/usr/lib/python3/dist-packages/sentry_sdk/integrations/celery.py", line 156, in _inner
                                               return f(*args, **kwargs)
                                             File "/usr/lib/python3/dist-packages/swh/loader/package/deposit/tasks.py", line 14, in load_deposit
                                               return DepositLoader.from_configfile(url=url, deposit_id=deposit_id).load()
                                             File "/usr/lib/python3/dist-packages/swh/loader/package/deposit/loader.py", line 147, in from_configfile
                                               return cls.from_config(deposit_client=deposit_client, **config)
                                             File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 118, in from_config
                                               return cls(storage=storage_instance, **config)
                                           TypeError: __init__() got an unexpected keyword argument 'extraction_dir'

Event Timeline

vsellier triaged this task as Unbreak Now! priority.Feb 19 2021, 4:50 PM
vsellier created this task.

Reschedule stale deposits due to this issue.

Staging:

$ ssh -L 5008:127.0.0.1:5008 scheduler0.internal.staging.swh.network  # tunnel ssh from machine
...
$ psql -t service=staging-swh-deposit -c "select load_task_id from deposit where status='verified' and load_task_id is not null" | xargs swh scheduler --url http://localhost:5008/ task respawn
WARNING:swh.core.cli:Could not load subcommand graph: module 'swh.graph.cli' has no attribute 'cli'
WARNING:swh.core.cli:Could not load subcommand dataset: No module named 'pyorc'
Respawn tasks ('18386154', '18310771', '18313731', '18312284', '18315219')

Prod:

$ psql -t service=swh-deposit -c "select load_task_id from deposit where status='verified' and load_task_id is not null" | xargs swh scheduler --url http://saatchi.internal.softwareheritage.org:5008/ task respawn
WARNING:swh.core.cli:Could not load subcommand graph: module 'swh.graph.cli' has no attribute 'cli'
WARNING:swh.core.cli:Could not load subcommand dataset: No module named 'pyorc'
Respawn tasks ('367650126', '367634664', '367651903', '367691151', '367692039', '367692941', '367693845', '367694739', '367737774', '349316656', '349316655', '349342892', '349342954', '349341966', '349386174', '349386115', '337340251', '337340272', '337340290', '167875974', '337370446', '316402903', '337352069', '337340249')

(some of those ^ are very old and unnecessary, we can improve the read query to filter on date as well)

Checks that the deposits have been properly ingested.

staging:

swh-deposit=> select count(*) from deposit where status='verified' and load_task_id is not null;
 count
-------
     0
(1 row)

prod:

softwareheritage-deposit=> select count(*) from deposit where status='verified' and load_task_id is not null and reception_date >= '2020-02-15' and status_detail is null;
 count
-------
     0
(1 row)

(Extra filtering because some old invalid deposits got rescheduled as well).

ardumont claimed this task.
Unknown Object (User) added a subscriber: Unknown Object (User).Jun 18 2021, 2:16 PM
This comment was removed by ardumont.
Unknown Object (User) added a subscriber: Unknown Object (User).Aug 4 2021, 5:02 AM
This comment was removed by ardumont.

(/me cleaned up the last 2 spam comments above and removed the spam users ^)