Event Timeline
Comment Actions
Try and reproduce:
softwareheritage-scheduler=> begin; softwareheritage-scheduler=> select * from swh_scheduler_end_task_run('5924ead8-55ec-4212-ad70-4237a29cd7c8', 'failed', '{}'::jsonb, '2018-03-08T10:04:01.193193+00:00'); ERROR: null value in column "next_run" violates not-null constraint DETAIL: Failing row contains (8599, origin-update-git, {"args": ["https://github.com/torvalds/linux"], "kwargs": {}}, null, 12:00:00, next_run_not_scheduled, recurring, 4). CONTEXT: SQL statement "update task set status = 'next_run_not_scheduled', next_run = now() + cur_task_type.retry_delay, retries_left = cur_task.retries_left - 1 where id = cur_task.id" PL/pgSQL function swh_scheduler_update_task_on_task_end() line 42 at SQL statement SQL function "swh_scheduler_end_task_run" statement 1
Comment Actions
Irc discussion:
11:45:30 +olasd | ardumont: I'm not sure this fix is correct, recurring tasks shouldn't be retried like that 11:49:12 +olasd | (they'll be retried eventually, there's no obvious need to make them jump the queue) 12:03:33 +ardumont | i was mainly trying to unstuck the listener, now it is and yes, if a better solution exists, i'm all for ti 12:03:34 +ardumont | it 12:04:12 +ardumont | does that mean that we need, in that function to discriminate with the policy as well 12:04:12 +ardumont | ? 12:06:40 +ardumont | (also, btw, that must not have happened often, since it's the first time we have this; the way i read the db, only the debian loader would have been ok, all other task types would have broken the same way) 12:07:20 +olasd | ardumont: I don't know why that task had a number of retries, it shouldn't have
And indeed:
softwareheritage-scheduler=> select type, count(*) from task where policy='recurring' and retries_left != 0 group by type; type | count -------------------+------- origin-update-git | 1 (1 row)
That's the only one!
12:13:40 +ardumont | olasd: so the real fix is, revert the default retry_delay in the trigger updated stored proc, and update that specific task with retries_left to 0 12:13:59 +olasd | I don't know off hand
So, a posteriori, I was on the right track.
As it seems my initial attempt seems to hold its peace anyway, i am keeping it as is for now.
And reference this for later.