This will allow us to easily plug new scheduling policies in that
function.
Details
- Reviewers
olasd ardumont - Group Reviewers
Reviewers - Commits
- rDSCHb641ac83ebbf: Make the grab_next_visits sql query modular
Diff Detail
- Repository
- rDSCH Scheduling utilities
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 18593 Build 28756: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 28755: arc lint + arc unit
Event Timeline
Build is green
Patch application report for D4896 (id=17408)
Could not rebase; Attempt merge onto 7905a6bea4...
Updating 7905a6b..8bab1ba Fast-forward .pre-commit-config.yaml | 1 + docs/index.rst | 1 + docs/simulator.rst | 65 +++++++++++ mypy.ini | 6 + requirements-simulator.txt | 2 + setup.py | 34 +++--- swh/scheduler/backend.py | 48 +++++--- swh/scheduler/cli/__init__.py | 2 +- swh/scheduler/cli/simulator.py | 68 ++++++++++++ swh/scheduler/simulator/__init__.py | 163 ++++++++++++++++++++++++++++ swh/scheduler/simulator/common.py | 132 ++++++++++++++++++++++ swh/scheduler/simulator/origin_scheduler.py | 68 ++++++++++++ swh/scheduler/simulator/origins.py | 128 ++++++++++++++++++++++ swh/scheduler/simulator/task_scheduler.py | 76 +++++++++++++ swh/scheduler/tests/test_simulator.py | 53 +++++++++ 15 files changed, 812 insertions(+), 35 deletions(-) create mode 100644 docs/simulator.rst create mode 100644 requirements-simulator.txt create mode 100644 swh/scheduler/cli/simulator.py create mode 100644 swh/scheduler/simulator/__init__.py create mode 100644 swh/scheduler/simulator/common.py create mode 100644 swh/scheduler/simulator/origin_scheduler.py create mode 100644 swh/scheduler/simulator/origins.py create mode 100644 swh/scheduler/simulator/task_scheduler.py create mode 100644 swh/scheduler/tests/test_simulator.py
Changes applied before test
commit 8bab1ba37aebbb9921e73ffbb17a9cb25a94c264
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Wed Jan 20 17:17:17 2021 +0100
Make the grab_next_visits sql query modular
This will allow us to easily plug new scheduling policies in that
function.
commit 898820fac52cf6fcfb5d2770aad49f131370a5a6
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Wed Jan 20 12:11:05 2021 +0100
simulator: collect and plot scheduler metrics over time
For now, only plot the known_origins and origins_never_visited metrics.
commit 9ce68f8d0e0ea69bd6672a50687079b5b1ea460c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Jan 19 18:36:53 2021 +0100
simulator: stop using get_scheduler directly
This reuses the scheduler instantiated by the cli instead of hardcoding
our own using the PG* variables.
commit 88e0b42805011bc3886f77ce5c91b3450351a16f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Jan 19 16:32:27 2021 +0100
simulator: Add documentation.
commit 62c6d90867bccb17ae076e1b5ee4db6fd350ad1b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Jan 19 16:17:24 2021 +0100
simulator: Make min_batch_size a parameter defined in the setup.
commit 9468bb9384f14e5fa0548b7d985f66fb3e36c85a
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Mon Jan 18 13:51:35 2021 +0100
simulator: add basic tests for fill_test_data and run
commit ead7b347db9d8852b4c347729d7e6d32b72d9058
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Jan 15 16:33:43 2021 +0100
simulator: implement a simulator for the "old" task-based scheduler
We extend the Task object with an autogenerated uuid allowing us to
track the task lifetime between its creation and the generation of visit
statuses, as the task-based scheduler does.
commit aecd27eee06aaa46d350e9d5b3f86ccc36a5446c
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Jan 15 16:31:42 2021 +0100
Move the simulator cli to the main cli module
commit 05067e3ecc888271507505112b48ebc9f755f5e7
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Jan 15 15:37:59 2021 +0100
simulator: Replace attrs with dataclasses for consistency
commit 24922fe2d995ca3ffa6c3c5a19c1f5f5531db4c8
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Jan 15 15:31:41 2021 +0100
simulator: wrap tasks and task events in typechecked objects
This allows us to extend these objects without redefining a bunch of
type annotations.
commit d5318aea0a93a94c80f8d743ce1de63592161f5a
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Jan 15 14:47:33 2021 +0100
simulator: also fill data for the task-based scheduler
commit 22ebb7a9a4bc6639e6f52d71c2b727537baf5019
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Fri Jan 15 14:41:05 2021 +0100
simulator: Split into smaller files in the same package
commit ad7bfbe731da64cc6d1ddaa3f5ae1ef1e3350f60
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Jan 15 12:50:00 2021 +0100
simulator: Make the run time a CLI argument
commit df34db0bfc61df418f00338345b4b46a86340f62
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Jan 15 12:40:16 2021 +0100
simulator: tweak simulation environment constants
commit 21ce2c88dddce081bfd525d08454ca09bbf521c6
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Jan 15 12:37:00 2021 +0100
simulator: generate more origins in fill_data
commit 29204199774b40bea4d3d23ffe9407a5d090f8fa
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Jan 15 12:35:01 2021 +0100
simulator: add typing for Environment.scheduler
commit 6433266106dda007d1e5304a0dcb01706c8acb42
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Jan 15 12:00:21 2021 +0100
simulator: add support for a basic SimulationReport
For now, this collects the runtime of tasks that have run, and gets
printed at the end of the simulation.
commit c474a825336a4e4132e83982e180451b02d8f54d
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Jan 15 11:45:23 2021 +0100
simulator: refine origin model to follow an exponential distribution
This models origins using a consistent characteristic "time between
commits" that follows an exponential distribution between 1 second and
10 years.
From this characteristic time, and feedback from the OriginVisitStats,
we can generate the expected run time and output status of the next
visit of that origin.
commit 2459badf0c05bf2cb663e66b9deabf1150638bb1
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Jan 15 11:43:20 2021 +0100
simulator: Remove some debug statements and lower log level
commit cb12449e8f57e59ec4c7953a3c4a52c9193d202e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Jan 14 15:17:11 2021 +0100
simulator: simulate the scheduler journal client
commit 20b7f9c68f831839f4be1cae4b9ae2dce0fc2d96
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Jan 14 15:12:38 2021 +0100
simulator: generate OriginVisitStatus objects in modeled visits
To be able to generate uneventful visits, we would need to store
the last snapshot seen for a given origin. Instead of storing this
within the simulator, which would be a concern for large scale
simulations, we use the scheduler visit cache directly.
commit 39ad47de2e753033c4b7114a64b5c3144b6ea821
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Jan 14 15:09:58 2021 +0100
simulator: Move scheduler into the simulation environment object
The scheduler is used by a lot of the simulated actors, it makes sense
to share it all the time.
commit 31967fa850c3afe29fc37e41cfcd53ff5408e7b9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Jan 14 15:07:56 2021 +0100
simulator: Use datetimes instead of a floating point simulated time
commit fc3f06bd1d77c76bfba4c05efcd62abcb5c46eea
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Wed Jan 13 16:13:01 2021 +0100
Introduce scaffolding for a scheduler simulator
This simulator will allow us to compare the behavior of the old and new
schedulers, as well as to test the impact of scheduler policies and their
parameters on the performance of the Software Heritage archival
infrastructure as a whole.See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/195/ for more details.
1 question about 'enabled' inlined there
but otherwise, lftm.
| swh/scheduler/backend.py | ||
|---|---|---|
| 331 | what does 'enabled' mean here? I gather that ends up in the query like "WHERE enabled AND visit_type=%s" | |
| swh/scheduler/backend.py | ||
|---|---|---|
| 331 | Whether this origin has been seen during the last listing, and visits should be scheduled | |
| swh/scheduler/backend.py | ||
|---|---|---|
| 331 | I don't think the enabled field is ever updated currently. But we will, eventually. Obviously this would deserve a comment rather than being snuck in. | |
Build is green
Patch application report for D4896 (id=17453)
Rebasing onto 9fb0dd6c7c...
First, rewinding head to replay your work on top of it... Applying: Make the grab_next_visits sql query modular
Changes applied before test
commit f82680a448910a059878ea91e71715a6b9697be9
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Wed Jan 20 17:17:17 2021 +0100
Make the grab_next_visits sql query modular
This will allow us to easily plug new scheduling policies in that
function.See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/209/ for more details.
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/210/
See console output for more information: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/210/console
Build is green
Patch application report for D4896 (id=17457)
Rebasing onto 9fb0dd6c7c...
Current branch diff-target is up to date.
Changes applied before test
commit b641ac83ebbf0b4d4166034467efa7c591793d50
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Wed Jan 20 17:17:17 2021 +0100
Make the grab_next_visits sql query modular
This will allow us to easily plug new scheduling policies in that
function.See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/213/ for more details.