This will allow us to easily plug new scheduling policies in that
function.
Details
- Reviewers
olasd ardumont - Group Reviewers
Reviewers - Commits
- rDSCHb641ac83ebbf: Make the grab_next_visits sql query modular
Diff Detail
- Repository
- rDSCH Scheduling utilities
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 18593 Build 28756: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 28755: arc lint + arc unit
Event Timeline
Build is green
Patch application report for D4896 (id=17408)
Could not rebase; Attempt merge onto 7905a6bea4...
Updating 7905a6b..8bab1ba Fast-forward .pre-commit-config.yaml | 1 + docs/index.rst | 1 + docs/simulator.rst | 65 +++++++++++ mypy.ini | 6 + requirements-simulator.txt | 2 + setup.py | 34 +++--- swh/scheduler/backend.py | 48 +++++--- swh/scheduler/cli/__init__.py | 2 +- swh/scheduler/cli/simulator.py | 68 ++++++++++++ swh/scheduler/simulator/__init__.py | 163 ++++++++++++++++++++++++++++ swh/scheduler/simulator/common.py | 132 ++++++++++++++++++++++ swh/scheduler/simulator/origin_scheduler.py | 68 ++++++++++++ swh/scheduler/simulator/origins.py | 128 ++++++++++++++++++++++ swh/scheduler/simulator/task_scheduler.py | 76 +++++++++++++ swh/scheduler/tests/test_simulator.py | 53 +++++++++ 15 files changed, 812 insertions(+), 35 deletions(-) create mode 100644 docs/simulator.rst create mode 100644 requirements-simulator.txt create mode 100644 swh/scheduler/cli/simulator.py create mode 100644 swh/scheduler/simulator/__init__.py create mode 100644 swh/scheduler/simulator/common.py create mode 100644 swh/scheduler/simulator/origin_scheduler.py create mode 100644 swh/scheduler/simulator/origins.py create mode 100644 swh/scheduler/simulator/task_scheduler.py create mode 100644 swh/scheduler/tests/test_simulator.py
Changes applied before test
commit 8bab1ba37aebbb9921e73ffbb17a9cb25a94c264 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 20 17:17:17 2021 +0100 Make the grab_next_visits sql query modular This will allow us to easily plug new scheduling policies in that function. commit 898820fac52cf6fcfb5d2770aad49f131370a5a6 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 20 12:11:05 2021 +0100 simulator: collect and plot scheduler metrics over time For now, only plot the known_origins and origins_never_visited metrics. commit 9ce68f8d0e0ea69bd6672a50687079b5b1ea460c Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 18:36:53 2021 +0100 simulator: stop using get_scheduler directly This reuses the scheduler instantiated by the cli instead of hardcoding our own using the PG* variables. commit 88e0b42805011bc3886f77ce5c91b3450351a16f Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 16:32:27 2021 +0100 simulator: Add documentation. commit 62c6d90867bccb17ae076e1b5ee4db6fd350ad1b Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 16:17:24 2021 +0100 simulator: Make min_batch_size a parameter defined in the setup. commit 9468bb9384f14e5fa0548b7d985f66fb3e36c85a Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Mon Jan 18 13:51:35 2021 +0100 simulator: add basic tests for fill_test_data and run commit ead7b347db9d8852b4c347729d7e6d32b72d9058 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:33:43 2021 +0100 simulator: implement a simulator for the "old" task-based scheduler We extend the Task object with an autogenerated uuid allowing us to track the task lifetime between its creation and the generation of visit statuses, as the task-based scheduler does. commit aecd27eee06aaa46d350e9d5b3f86ccc36a5446c Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:31:42 2021 +0100 Move the simulator cli to the main cli module commit 05067e3ecc888271507505112b48ebc9f755f5e7 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:37:59 2021 +0100 simulator: Replace attrs with dataclasses for consistency commit 24922fe2d995ca3ffa6c3c5a19c1f5f5531db4c8 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:31:41 2021 +0100 simulator: wrap tasks and task events in typechecked objects This allows us to extend these objects without redefining a bunch of type annotations. commit d5318aea0a93a94c80f8d743ce1de63592161f5a Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 14:47:33 2021 +0100 simulator: also fill data for the task-based scheduler commit 22ebb7a9a4bc6639e6f52d71c2b727537baf5019 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Jan 15 14:41:05 2021 +0100 simulator: Split into smaller files in the same package commit ad7bfbe731da64cc6d1ddaa3f5ae1ef1e3350f60 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:50:00 2021 +0100 simulator: Make the run time a CLI argument commit df34db0bfc61df418f00338345b4b46a86340f62 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:40:16 2021 +0100 simulator: tweak simulation environment constants commit 21ce2c88dddce081bfd525d08454ca09bbf521c6 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:37:00 2021 +0100 simulator: generate more origins in fill_data commit 29204199774b40bea4d3d23ffe9407a5d090f8fa Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:35:01 2021 +0100 simulator: add typing for Environment.scheduler commit 6433266106dda007d1e5304a0dcb01706c8acb42 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:00:21 2021 +0100 simulator: add support for a basic SimulationReport For now, this collects the runtime of tasks that have run, and gets printed at the end of the simulation. commit c474a825336a4e4132e83982e180451b02d8f54d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:45:23 2021 +0100 simulator: refine origin model to follow an exponential distribution This models origins using a consistent characteristic "time between commits" that follows an exponential distribution between 1 second and 10 years. From this characteristic time, and feedback from the OriginVisitStats, we can generate the expected run time and output status of the next visit of that origin. commit 2459badf0c05bf2cb663e66b9deabf1150638bb1 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:43:20 2021 +0100 simulator: Remove some debug statements and lower log level commit cb12449e8f57e59ec4c7953a3c4a52c9193d202e Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:17:11 2021 +0100 simulator: simulate the scheduler journal client commit 20b7f9c68f831839f4be1cae4b9ae2dce0fc2d96 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:12:38 2021 +0100 simulator: generate OriginVisitStatus objects in modeled visits To be able to generate uneventful visits, we would need to store the last snapshot seen for a given origin. Instead of storing this within the simulator, which would be a concern for large scale simulations, we use the scheduler visit cache directly. commit 39ad47de2e753033c4b7114a64b5c3144b6ea821 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:09:58 2021 +0100 simulator: Move scheduler into the simulation environment object The scheduler is used by a lot of the simulated actors, it makes sense to share it all the time. commit 31967fa850c3afe29fc37e41cfcd53ff5408e7b9 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:07:56 2021 +0100 simulator: Use datetimes instead of a floating point simulated time commit fc3f06bd1d77c76bfba4c05efcd62abcb5c46eea Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 13 16:13:01 2021 +0100 Introduce scaffolding for a scheduler simulator This simulator will allow us to compare the behavior of the old and new schedulers, as well as to test the impact of scheduler policies and their parameters on the performance of the Software Heritage archival infrastructure as a whole.
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/195/ for more details.
1 question about 'enabled' inlined there
but otherwise, lftm.
swh/scheduler/backend.py | ||
---|---|---|
331 | what does 'enabled' mean here? I gather that ends up in the query like "WHERE enabled AND visit_type=%s" |
swh/scheduler/backend.py | ||
---|---|---|
331 | Whether this origin has been seen during the last listing, and visits should be scheduled |
swh/scheduler/backend.py | ||
---|---|---|
331 | I don't think the enabled field is ever updated currently. But we will, eventually. Obviously this would deserve a comment rather than being snuck in. |
Build is green
Patch application report for D4896 (id=17453)
Rebasing onto 9fb0dd6c7c...
First, rewinding head to replay your work on top of it... Applying: Make the grab_next_visits sql query modular
Changes applied before test
commit f82680a448910a059878ea91e71715a6b9697be9 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 20 17:17:17 2021 +0100 Make the grab_next_visits sql query modular This will allow us to easily plug new scheduling policies in that function.
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/209/ for more details.
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/210/
See console output for more information: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/210/console
Build is green
Patch application report for D4896 (id=17457)
Rebasing onto 9fb0dd6c7c...
Current branch diff-target is up to date.
Changes applied before test
commit b641ac83ebbf0b4d4166034467efa7c591793d50 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 20 17:17:17 2021 +0100 Make the grab_next_visits sql query modular This will allow us to easily plug new scheduling policies in that function.
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/213/ for more details.