Page MenuHomeSoftware Heritage

scheduler.backend: Make filter_task_to_archive a paginated endpoint
ClosedPublic

Authored by ardumont on Dec 14 2019, 11:25 AM.

Diff Detail

Repository
rDSCH Scheduling utilities
Branch
into-paginated-endpoint
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 9792
Build 14459: tox-on-jenkinsJenkins
Build 14458: arc lint + arc unit

Event Timeline

ardumont created this revision.Dec 14 2019, 11:25 AM
ardumont updated this revision to Diff 8673.Dec 14 2019, 2:29 PM
  • backend: Filter properly archive within the defined range

Now the behavior is enforced server side. It is already dealt with within the
client cli.

vlorentz requested changes to this revision.Dec 14 2019, 4:59 PM
vlorentz added a subscriber: vlorentz.

Please use the same pagination mechanism as the other services, for consistency. eg. https://forge.softwareheritage.org/source/swh-indexer/browse/master/swh/indexer/storage/__init__.py$791-813 and https://forge.softwareheritage.org/source/swh-search/browse/master/swh/search/elasticsearch.py$123-140

also, rename limit to count, also for consistency with them.

(I'll review the rest of the diff after you're done)

This revision now requires changes to proceed.Dec 14 2019, 4:59 PM

Please use the same pagination mechanism as the other services, for consistency. eg. https://forge.softwareheritage.org/source/swh-indexer/browse/master/swh/indexer/storage/__init__.py$791-813 and https://forge.softwareheritage.org/source/swh-search/browse/master/swh/search/elasticsearch.py$123-140

Thanks for the links

also, rename limit to count, also for consistency with them.

sure

note that the storage one stills uses limit

ardumont updated this revision to Diff 8675.Dec 14 2019, 6:28 PM
  • backend: Align paginated endpoint consistently with others

(I initially kept the original signature so it did not touch the client part)

This also:

  • checks consistently the tasks in paginated calls
  • improve the backend's docstring method
ardumont updated this revision to Diff 8676.Dec 14 2019, 6:34 PM

Fix docstring

swh/scheduler/sql/40-swh-func.sql
309

Prior to this, the actual filtering ended up listing tasks completely out of the input range.
Because we have tasks with started column left null (they never started).
Those have the scheduled entry set so we use it for filtering (which was already done client side in the task cli).

Now we are consistent.
And we will finally be able to filter those out of the scheduler db.

vlorentz requested changes to this revision.Dec 16 2019, 1:49 PM
vlorentz added inline comments.
swh/scheduler/backend.py
422

page_token should be an optional string (both for consistency, and be cause we want it to be an opaque token we can change at any time)

swh/scheduler/cli/task.py
538–539

The last_id as a loop variable should be renamed page_token, because it should be an opaque identifier.

As for its use as argument of index_data, I don't know what to do with it. Maybe add an extra parameter to filter_task_to_archive, that is ignored if page_token is not None? Or just remove it altogether if we don't actually need it?

tox.ini
10

why?

This revision now requires changes to proceed.Dec 16 2019, 1:49 PM
ardumont added inline comments.Dec 16 2019, 3:39 PM
swh/scheduler/backend.py
422

yes, i saw it this morning when reading back your other diffs.

I'm ultimately trying to add tests on the other diff and it takes me way too long.
i'll get back to that after i'm done with those tests.

swh/scheduler/cli/task.py
538–539

we do need it, i'm trying to avoid repeating the while truc loop.
Take a look below

gen = index_data...
if cleanup:
    # do something with gen
else:
    # do something else with gen
tox.ini
10

Because i'm using tox and ipdb to try and debug the hell of what's happening in there ;)

tox -e py3-dev -- -x -s --no-cov --log-level=DEBUG -k test_backend_setup
ardumont added inline comments.Dec 16 2019, 4:14 PM
swh/scheduler/api/client.py
102

revert back to limit now.

ardumont added inline comments.Dec 16 2019, 4:31 PM
swh/scheduler/cli/task.py
538–539

i meant while True ;)

ardumont updated this revision to Diff 8705.Dec 16 2019, 4:42 PM
  • backend: Align paginated endpoint consistently with others:
    • Revert to use limit (and not count)
    • use optional string type as opaque identifier
vlorentz accepted this revision.Dec 16 2019, 4:45 PM
vlorentz added inline comments.
swh/scheduler/cli/task.py
558

You're still not using it as an opaque identifier if it comes from a CLI option documented as "task id to start from". But I guess there is no easy way other than this one

This revision is now accepted and ready to land.Dec 16 2019, 4:45 PM
ardumont updated this revision to Diff 8707.Dec 16 2019, 4:54 PM

Plug to master branch