Page MenuHomeSoftware Heritage

Errors from vault workers are not logged to Sentry
Closed, MigratedEdits Locked

Description

After reporting an error to the vault backend, workers call logging.exception: https://forge.softwareheritage.org/source/swh-vault/browse/master/swh/vault/cookers/base.py$154

and this should be captured by Sentry, but it isn't.

I don't understand why, as vault workers use the same plumbling as loader workers (ie. https://forge.softwareheritage.org/source/swh-scheduler/browse/master/swh/scheduler/celery_backend/config.py$157-167)

Event Timeline

vlorentz triaged this task as High priority.Feb 8 2022, 2:55 PM
vlorentz created this task.

Looking at puppet configuration, my guess is that the sentry_dsn is not set for the vault cookers.

All other components running in celery workers have sentry_dsn configuration set (see loader-git for instance).

@ardumont, @vsellier , do you confirm ?

Yes, i confirm (from #swh-sysadm discussion)

14:55 <+vlorentz> T3920
14:55 -- Notice(swhbot): T3920 (submitter: vlorentz, priority High, status: Open): Errors from vault workers are not logged to Sentry <https://forge.softwareheritage.org/T3920>
14:58 <+ardumont> T3920, they do not seem to have the sentry configuration set in puppet, i'll attend to that later
ardumont changed the task status from Open to Work in Progress.Feb 8 2022, 5:15 PM
ardumont moved this task from Backlog to Weekly backlog on the System administration board.
ardumont moved this task from Weekly backlog to in-progress on the System administration board.

So i was wrong, it is correctly set [1].
And there are sentry issues about workers [2].

[1] D7123#185431

[2] https://sentry.softwareheritage.org/share/issue/ec2b1220736b423aae13b1e8ef86785b/

So i don't currently know what's wrong (if anything is).

so I'm guessing Celery is eating logs somehow, so Sentry doesn't see them

Something is bothering me, isn't there some catchall exceptions happening in the vault source code somewhere already?

Because that's currently working for other modules, so i don't get the difference here.

It might be worth double checking the vault code with this ^.


Anyway, nice finding. For your information, I've stopped the secondary cooker worker and adapted the other one with what you suggested:

#@worker_init.connect
@worker_process_init.connect
@_print_errors
def on_worker_init(*args, **kwargs):
    try:
        from sentry_sdk.integrations.celery import CeleryIntegration
    except ImportError:
        integrations = []
    else:
        integrations = [CeleryIntegration()]
    sentry_dsn = None  # will be set in `init_sentry` function
    init_sentry(sentry_dsn, integrations=integrations, debug=True)

The process starts ok with the change [1] (with debug=True as proposed by the doc, dropped after the tryout).

I'll let you trigger some cooking and reports here your finding.

If that's satisfying enough for you, we can try and adapt this in the scheduler.

What do you think?

[1]

Feb 09 08:48:17 worker11 python3[13674]:   . swh.vault.cooking_tasks.SWHBatchCookingTask                                                                                                       Feb 09 08:48:17 worker11 python3[13674]:   . swh.vault.cooking_tasks.SWHCookingTask
Feb 09 08:48:17 worker11 python3[13680]:  [sentry] DEBUG: Setting up integrations (with default = True)
Feb 09 08:48:17 worker11 python3[13680]: [2022-02-09 08:48:17,280: DEBUG/ForkPoolWorker-1] Setting up integrations (with default = True)
Feb 09 08:48:17 worker11 python3[13680]:  [sentry] DEBUG: Setting up previously not enabled integration celery                                                                                 Feb 09 08:48:17 worker11 python3[13680]: [2022-02-09 08:48:17,285: DEBUG/ForkPoolWorker-1] Setting up previously not enabled integration celery
Feb 09 08:48:17 worker11 python3[13680]:  [sentry] DEBUG: Setting up previously not enabled integration logging
Feb 09 08:48:17 worker11 python3[13680]: [2022-02-09 08:48:17,286: DEBUG/ForkPoolWorker-1] Setting up previously not enabled integration logging                                               Feb 09 08:48:17 worker11 python3[13680]:  [sentry] DEBUG: Setting up previously not enabled integration stdlib
Feb 09 08:48:17 worker11 python3[13680]: [2022-02-09 08:48:17,287: DEBUG/ForkPoolWorker-1] Setting up previously not enabled integration stdlib                                                Feb 09 08:48:17 worker11 python3[13680]:  [sentry] DEBUG: Setting up previously not enabled integration excepthook
Feb 09 08:48:17 worker11 python3[13680]: [2022-02-09 08:48:17,288: DEBUG/ForkPoolWorker-1] Setting up previously not enabled integration excepthook                                            Feb 09 08:48:17 worker11 python3[13680]:  [sentry] DEBUG: Setting up previously not enabled integration dedupe
Feb 09 08:48:17 worker11 python3[13680]: [2022-02-09 08:48:17,288: DEBUG/ForkPoolWorker-1] Setting up previously not enabled integration dedupe
Feb 09 08:48:17 worker11 python3[13680]:  [sentry] DEBUG: Setting up previously not enabled integration atexit
Feb 09 08:48:17 worker11 python3[13680]: [2022-02-09 08:48:17,289: DEBUG/ForkPoolWorker-1] Setting up previously not enabled integration atexit
Feb 09 08:48:17 worker11 python3[13680]:  [sentry] DEBUG: Setting up previously not enabled integration modules
Feb 09 08:48:17 worker11 python3[13680]: [2022-02-09 08:48:17,290: DEBUG/ForkPoolWorker-1] Setting up previously not enabled integration modules
Feb 09 08:48:17 worker11 python3[13680]:  [sentry] DEBUG: Setting up previously not enabled integration argv
Feb 09 08:48:17 worker11 python3[13680]: [2022-02-09 08:48:17,291: DEBUG/ForkPoolWorker-1] Setting up previously not enabled integration argv
Feb 09 08:48:17 worker11 python3[13680]:  [sentry] DEBUG: Setting up previously not enabled integration threading
Feb 09 08:48:17 worker11 python3[13680]: [2022-02-09 08:48:17,292: DEBUG/ForkPoolWorker-1] Setting up previously not enabled integration threading
Feb 09 08:48:17 worker11 python3[13680]:  [sentry] DEBUG: Enabling integration celery
Feb 09 08:48:17 worker11 python3[13680]: [2022-02-09 08:48:17,292: DEBUG/ForkPoolWorker-1] Enabling integration celery
Feb 09 08:48:17 worker11 python3[13680]:  [sentry] DEBUG: Enabling integration logging
Feb 09 08:48:17 worker11 python3[13680]:  [sentry] DEBUG: Enabling integration stdlib
Feb 09 08:48:17 worker11 python3[13680]:  [sentry] DEBUG: Enabling integration excepthook
Feb 09 08:48:17 worker11 python3[13680]:  [sentry] DEBUG: Enabling integration dedupe
Feb 09 08:48:17 worker11 python3[13680]:  [sentry] DEBUG: Enabling integration atexit
Feb 09 08:48:17 worker11 python3[13680]:  [sentry] DEBUG: Enabling integration modules
Feb 09 08:48:17 worker11 python3[13680]:  [sentry] DEBUG: Enabling integration argv
Feb 09 08:48:17 worker11 python3[13680]:  [sentry] DEBUG: Enabling integration threading
Feb 09 08:48:17 worker11 python3[13680]: [2022-02-09 08:48:17,293: DEBUG/ForkPoolWorker-1] Enabling integration logging
Feb 09 08:48:17 worker11 python3[13680]: [2022-02-09 08:48:17,294: DEBUG/ForkPoolWorker-1] Enabling integration stdlib
Feb 09 08:48:17 worker11 python3[13680]: [2022-02-09 08:48:17,294: DEBUG/ForkPoolWorker-1] Enabling integration excepthook
Feb 09 08:48:17 worker11 python3[13680]: [2022-02-09 08:48:17,294: DEBUG/ForkPoolWorker-1] Enabling integration dedupe
Feb 09 08:48:17 worker11 python3[13680]: [2022-02-09 08:48:17,294: DEBUG/ForkPoolWorker-1] Enabling integration atexit
Feb 09 08:48:17 worker11 python3[13680]: [2022-02-09 08:48:17,294: DEBUG/ForkPoolWorker-1] Enabling integration modules
Feb 09 08:48:17 worker11 python3[13680]: [2022-02-09 08:48:17,294: DEBUG/ForkPoolWorker-1] Enabling integration argv
Feb 09 08:48:17 worker11 python3[13680]: [2022-02-09 08:48:17,294: DEBUG/ForkPoolWorker-1] Enabling integration threading

...

I'll let you trigger some cooking and reports here your finding.

I don't know how to trigger an error in the vault, currently; you'd have to change the code to manually do that :|

Also that's a cooker worker issue from sentry from 18h ago or so (as of the moment of this comment).

So i really don't think there is something wrong?

[1] https://sentry.softwareheritage.org/organizations/swh/issues/7860/?project=11&query=is%3Aunresolved

I don't know how to trigger an error in the vault, currently; you'd have to change the code to manually do that :|

I don't know how to trigger either.
I'm a bit dubious for the modification part.

Can't you trigger some cookings to your heart's content?
At some point, some error should happen?

ardumont claimed this task.

It seems everything is already ok (another cooking task issue reported [1]) in the end so closing this.

[1] https://sentry.softwareheritage.org/share/issue/5329519610794efbbf45d2ef91bb1458/

ardumont changed the task status from Invalid to Resolved.Feb 9 2022, 3:03 PM