Page MenuHomeSoftware Heritage

backend: Open endpoints to peek/grab tasks with any priority
ClosedPublic

Authored by ardumont on Apr 13 2021, 6:03 PM.

Details

Summary

The priority notion becomes a blur. Any tasks with a non null priority is considered for
reading or grabbing providing the task is ready to be scheduled.

In a future commit, this should allow to make the runner evolve to reroute tasks with
priority to other queues.

Related to T3084

Test Plan

tox

Diff Detail

Repository
rDSCH Scheduling utilities
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D5503 (id=19655)

Rebasing onto ecab745a5f...

Current branch diff-target is up to date.
Changes applied before test
commit c242a1db96a67a3efdc77a53f15158261039039e
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Apr 13 17:16:37 2021 +0200

    backend: Open endpoints to peek/grab tasks with any priority
    
    The priority notion becomes a blur. Any tasks with a non null priority is considered for
    reading or grabbing.
    
    In a future commit, this should allow to make the runner evolve to reroute tasks with
    priority to other queues.
    
    Related to T3084

Link to build: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/305/
See console output for more information: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/305/console

Harbormaster returned this revision to the author for changes because remote builds failed.Apr 13 2021, 6:05 PM
Harbormaster failed remote builds in B20682: Diff 19655!

Build is green

Patch application report for D5503 (id=19656)

Rebasing onto ecab745a5f...

Current branch diff-target is up to date.
Changes applied before test
commit 3e2ae3d46d61122d63ade5a908b1a766cc2731f7
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Apr 13 17:16:37 2021 +0200

    backend: Open endpoints to peek/grab tasks with any priority
    
    The priority notion becomes a blur. Any tasks with a non null priority is considered for
    reading or grabbing.
    
    In a future commit, this should allow to make the runner evolve to reroute tasks with
    priority to other queues.
    
    Related to T3084

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/306/ for more details.

Could you deduplicate swh_scheduler_peek_any_ready_priority_tasks and swh_scheduler_peek_tasks_with_priority?

In the existing function, if you pass a null as priority value, and replace the filtering predicate with and (task_priority is null or t.priority = task_priority) (in place of and t.priority = task.priority) the functionality should be equivalent.

Apart from that, this is a step towards what we've discussed, so this looks fine.

This revision is now accepted and ready to land.Apr 13 2021, 6:52 PM

To clarify my initial trail of thought of why the existing code is untouched. Those
priority functions should be cleaned up soon, I avoided touching the old ones so as to
simplify the ulterior cleaning up.

Now for your suggestion.

In the existing function, if you pass a null as priority value, and replace the
filtering predicate with and (task_priority is null or t.priority = task_priority) (in
place of and t.priority = task.priority) the functionality should be equivalent.

IIUYR, I don't think it's equivalent. I want to grab only task whose priority column is
filled with non null value. I understood your suggestion as the opposite for the null
case... (see details below)

Details

I understood that replacing in the right function:

where ... and task.priority is not null

by:

where ... and (task_priority is null or t.priority = task_priority)

for the case as task_priority null

(so: where ... and (null is null or t.priority = null) in the null case)

should yield the same result so rows with any priority with non null values.

That's not the case, tried it out in our scheduler db with [1]

That returned rows with priority null. The rows wanted are the complementary ones
though, all rows with any 'non-null' priority.

[1]

softwareheritage-scheduler=> select priority, id, type, next_run, status
  from task t
  where t.next_run <= now() - interval '10 days'
        and t.type = 'load-git'
        and t.status = 'next_run_not_scheduled'
        and (null is null or t.priority = null)
  order by t.next_run
  limit 10;
 priority |    id     |   type   |           next_run            |         status
----------+-----------+----------+-------------------------------+------------------------
          | 320406498 | load-git | 2020-03-22 23:20:07.948464+00 | next_run_not_scheduled
          | 320406499 | load-git | 2020-03-22 23:20:07.948488+00 | next_run_not_scheduled
...

(I inlined values for the query to be executable)

I don't see an immediate implementation to keep existing code and just do that
(parameter task_priority is an enum so any non enum values like 'any' would be
rejected). I don't want to touch that signature too much either.

So i'm in a mind to keep the current implementation if you don't mind.

Sure, you can keep this implementation, that's why the diff was accepted in the first place.