Metrics are computed and cached database-side by the update_metrics
function. The get_metrics function only retrieves the cached data.
Details
- Reviewers
olasd douardda - Group Reviewers
Reviewers - Commits
- rDSCH114ed952e513: Implement some basic aggregated metrics on listed origins
basic tests added for each metric
Diff Detail
- Repository
- rDSCH Scheduling utilities
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 18494 Build 28601: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 28600: arc lint + arc unit
Event Timeline
Build is green
Patch application report for D4880 (id=17337)
Could not rebase; Attempt merge onto 5e609d5205...
Updating 5e609d5..826094b Fast-forward .pre-commit-config.yaml | 1 + docs/index.rst | 1 + docs/simulator.rst | 55 ++++++++++ mypy.ini | 6 ++ requirements-simulator.txt | 2 + setup.py | 34 +++--- sql/updates/24.sql | 56 ++++++++++ swh/scheduler/backend.py | 62 +++++++++++ swh/scheduler/cli/__init__.py | 2 +- swh/scheduler/cli/simulator.py | 57 ++++++++++ swh/scheduler/interface.py | 31 ++++++ swh/scheduler/model.py | 32 ++++++ swh/scheduler/simulator/__init__.py | 126 ++++++++++++++++++++++ swh/scheduler/simulator/common.py | 102 ++++++++++++++++++ swh/scheduler/simulator/origin_scheduler.py | 69 +++++++++++++ swh/scheduler/simulator/origins.py | 119 +++++++++++++++++++++ swh/scheduler/simulator/task_scheduler.py | 77 ++++++++++++++ swh/scheduler/sql/30-schema.sql | 24 ++++- swh/scheduler/sql/40-func.sql | 33 ++++++ swh/scheduler/tests/test_api_client.py | 2 + swh/scheduler/tests/test_scheduler.py | 155 +++++++++++++++++++++++++++- swh/scheduler/tests/test_simulator.py | 45 ++++++++ 22 files changed, 1071 insertions(+), 20 deletions(-) create mode 100644 docs/simulator.rst create mode 100644 requirements-simulator.txt create mode 100644 sql/updates/24.sql create mode 100644 swh/scheduler/cli/simulator.py create mode 100644 swh/scheduler/simulator/__init__.py create mode 100644 swh/scheduler/simulator/common.py create mode 100644 swh/scheduler/simulator/origin_scheduler.py create mode 100644 swh/scheduler/simulator/origins.py create mode 100644 swh/scheduler/simulator/task_scheduler.py create mode 100644 swh/scheduler/tests/test_simulator.py
Changes applied before test
commit 826094bf8a4be4b4295a6b024f6b85aafad871bb Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jan 19 14:23:32 2021 +0100 Implement some basic aggregated metrics on listed origins Metrics are computed and cached database-side by the `update_metrics` function. The `get_metrics` function only retrieves the cached data. commit 77362633b7485bfe3944d8c278d509eb60f0d664 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Mon Jan 18 13:51:35 2021 +0100 simulator: add basic tests for fill_test_data and run commit eb7676ea2e8dcc5fa92067ad7858e5069ccc8db1 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:33:43 2021 +0100 simulator: implement a simulator for the "old" task-based scheduler We extend the Task object with an autogenerated uuid allowing us to track the task lifetime between its creation and the generation of visit statuses, as the task-based scheduler does. commit 687e6f007cb4943ef19ff87b87953607c6f206b7 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:31:42 2021 +0100 Move the simulator cli to the main cli module commit c3f520abc55c7355dbef0d2fed1102cc30040176 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:37:59 2021 +0100 simulator: Replace attrs with dataclasses for consistency commit 58042267faeaaab656c1e459b14fcfa24f300795 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:31:41 2021 +0100 simulator: wrap tasks and task events in typechecked objects This allows us to extend these objects without redefining a bunch of type annotations. commit b58eb740e9d407b95a1df632eaef91bbe6c3ff8b Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 14:47:33 2021 +0100 simulator: also fill data for the task-based scheduler commit 85df218106a0c29dd79900321572e87a7c90a5bd Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Jan 15 14:41:05 2021 +0100 simulator: Split into smaller files in the same package commit 2bc5187c76657b00d54b61f993aeeb2de25acf18 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:50:00 2021 +0100 simulator: Make the run time a CLI argument commit edf406dc961c8d0a77a34e73b0e19fcd511bd27d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:40:16 2021 +0100 simulator: tweak simulation environment constants commit e7d60a996249b6827332e17e2977bec1b69eab83 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:37:00 2021 +0100 simulator: generate more origins in fill_data commit b663e5414a09bc0b5a22c111894433d71c77f42c Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:35:01 2021 +0100 simulator: add typing for Environment.scheduler commit c3e8380e1aa140c8823ef76ba6d384474f160c9b Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:00:21 2021 +0100 simulator: add support for a basic SimulationReport For now, this collects the runtime of tasks that have run, and gets printed at the end of the simulation. commit b4b20ad406d8925cb4aa96828dbc5af14e0bda8d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:45:23 2021 +0100 simulator: refine origin model to follow an exponential distribution This models origins using a consistent characteristic "time between commits" that follows an exponential distribution between 1 second and 10 years. From this characteristic time, and feedback from the OriginVisitStats, we can generate the expected run time and output status of the next visit of that origin. commit cce6ce250ee0e73cc2b486c32cae8c05265a9974 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:43:20 2021 +0100 simulator: Remove some debug statements and lower log level commit 7e5f99837487c3785dfa96ed28ce9fecdf25bad8 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:17:11 2021 +0100 simulator: simulate the scheduler journal client commit 63c3beea168e2f41ff0cbd71fe53af95e062748a Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:12:38 2021 +0100 simulator: generate OriginVisitStatus objects in modeled visits To be able to generate uneventful visits, we would need to store the last snapshot seen for a given origin. Instead of storing this within the simulator, which would be a concern for large scale simulations, we use the scheduler visit cache directly. commit dd06b1bd428c15cf8ebb89873f24ee372ff363eb Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:09:58 2021 +0100 simulator: Move scheduler into the simulation environment object The scheduler is used by a lot of the simulated actors, it makes sense to share it all the time. commit 524ec4a50a60eb45815faf49d8d675a86756955b Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:07:56 2021 +0100 simulator: Use datetimes instead of a floating point simulated time commit 11263f58a02c9f1aa485df5ea4ac5131998f3d69 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 13 16:13:01 2021 +0100 Introduce scaffolding for a scheduler simulator This simulator will allow us to compare the behavior of the old and new schedulers, as well as to test the impact of scheduler policies and their parameters on the performance of the Software Heritage archival infrastructure as a whole.
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/156/ for more details.
Build is green
Patch application report for D4880 (id=17357)
Rebasing onto 0a32a31195...
First, rewinding head to replay your work on top of it... Applying: Import the journal subcommand in the main swh.scheduler cli Applying: Introduce scaffolding for a scheduler simulator Applying: simulator: Use datetimes instead of a floating point simulated time Applying: simulator: Move scheduler into the simulation environment object Applying: simulator: generate OriginVisitStatus objects in modeled visits Applying: simulator: simulate the scheduler journal client Applying: simulator: Remove some debug statements and lower log level Applying: simulator: refine origin model to follow an exponential distribution Applying: simulator: add support for a basic SimulationReport Applying: simulator: add typing for Environment.scheduler Applying: simulator: generate more origins in fill_data Applying: simulator: tweak simulation environment constants Applying: simulator: Make the run time a CLI argument Applying: simulator: Split into smaller files in the same package Applying: simulator: also fill data for the task-based scheduler Applying: simulator: wrap tasks and task events in typechecked objects Applying: simulator: Replace attrs with dataclasses for consistency Applying: Move the simulator cli to the main cli module Applying: simulator: implement a simulator for the "old" task-based scheduler Applying: simulator: add basic tests for fill_test_data and run Applying: simulator: Make min_batch_size a parameter defined in the setup. Applying: simulator: Add documentation. Applying: Implement some basic aggregated metrics on listed origins
Changes applied before test
commit dc1591850962e6ffe995bcc9bc4f7001f244c4a2 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jan 19 14:23:32 2021 +0100 Implement some basic aggregated metrics on listed origins Metrics are computed and cached database-side by the `update_metrics` function. The `get_metrics` function only retrieves the cached data. commit 20eac21568d723c0ae724f4bf440057de3a3ab65 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 16:32:27 2021 +0100 simulator: Add documentation. commit 2d65ccfdd75a5f6e11e2d1fdee747c84703686a8 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 16:17:24 2021 +0100 simulator: Make min_batch_size a parameter defined in the setup. commit 4c43a30903ff32fbfd8c51f2c6bd7701a7b548b9 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Mon Jan 18 13:51:35 2021 +0100 simulator: add basic tests for fill_test_data and run commit ca0a9f4ccee9bc3305b476b13fcdaaf00e763d6d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:33:43 2021 +0100 simulator: implement a simulator for the "old" task-based scheduler We extend the Task object with an autogenerated uuid allowing us to track the task lifetime between its creation and the generation of visit statuses, as the task-based scheduler does. commit 643b4d56d1a6acdcad753747a0fa7275456753b2 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:31:42 2021 +0100 Move the simulator cli to the main cli module commit 7895d2d7deb930baea73a44f5102c260f4aba0ea Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:37:59 2021 +0100 simulator: Replace attrs with dataclasses for consistency commit 0007537c8738e392b16c4147e51e4108cc8249a2 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:31:41 2021 +0100 simulator: wrap tasks and task events in typechecked objects This allows us to extend these objects without redefining a bunch of type annotations. commit 8d62b86da3e9ed728251fe8d1a7b032b6b64c726 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 14:47:33 2021 +0100 simulator: also fill data for the task-based scheduler commit c8d1fd7f79b770dbd0f2981b6c682a3049a898be Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Jan 15 14:41:05 2021 +0100 simulator: Split into smaller files in the same package commit ef07de8d32cdb464545482390828fb0577514b82 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:50:00 2021 +0100 simulator: Make the run time a CLI argument commit 49b8b086c25c8fd683d7089095cd2cfaa1b61cd5 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:40:16 2021 +0100 simulator: tweak simulation environment constants commit 5cc155fdf6292e652cfa786406d7869e4a23e871 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:37:00 2021 +0100 simulator: generate more origins in fill_data commit 060a000cc3fd918645c01531e1f82bf53410cf17 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:35:01 2021 +0100 simulator: add typing for Environment.scheduler commit d0abbe887d202622d20cf9110fc863e0eea7aad6 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:00:21 2021 +0100 simulator: add support for a basic SimulationReport For now, this collects the runtime of tasks that have run, and gets printed at the end of the simulation. commit 5c21f7d7aea332232823f3e68547a7a1c40f6122 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:45:23 2021 +0100 simulator: refine origin model to follow an exponential distribution This models origins using a consistent characteristic "time between commits" that follows an exponential distribution between 1 second and 10 years. From this characteristic time, and feedback from the OriginVisitStats, we can generate the expected run time and output status of the next visit of that origin. commit 5274886994683a0ae2971d587a7e5cf9ae8800b4 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:43:20 2021 +0100 simulator: Remove some debug statements and lower log level commit 66dfb010cc4eeea072f58adf75d0d7602ad52064 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:17:11 2021 +0100 simulator: simulate the scheduler journal client commit c6aa5d0bc3d185694ca1ea20173b01530ad15118 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:12:38 2021 +0100 simulator: generate OriginVisitStatus objects in modeled visits To be able to generate uneventful visits, we would need to store the last snapshot seen for a given origin. Instead of storing this within the simulator, which would be a concern for large scale simulations, we use the scheduler visit cache directly. commit dc3ab75fb4da1e63ca98281b57abddbd40e3c3af Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:09:58 2021 +0100 simulator: Move scheduler into the simulation environment object The scheduler is used by a lot of the simulated actors, it makes sense to share it all the time. commit ce78c550ea99d1ee6933c93188f8352faf772d53 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:07:56 2021 +0100 simulator: Use datetimes instead of a floating point simulated time commit 3d3058d49d116fb753456572b457dd026cd278cb Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 13 16:13:01 2021 +0100 Introduce scaffolding for a scheduler simulator This simulator will allow us to compare the behavior of the old and new schedulers, as well as to test the impact of scheduler policies and their parameters on the performance of the Software Heritage archival infrastructure as a whole. commit e32a3e63782d2aab5b894fa1c83122aeb199500d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jan 19 17:56:44 2021 +0100 Import the journal subcommand in the main swh.scheduler cli This issue was masked by tox.ini using pytest with --doctest-modules, which imports all modules during test collection, and therefore executing the side-effects of swh.scheduler.cli.journal.
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/166/ for more details.
Build is green
Patch application report for D4880 (id=17358)
Could not rebase; Attempt merge onto 0a32a31195...
Auto-merging setup.py Merge made by the 'recursive' strategy. .pre-commit-config.yaml | 1 + docs/index.rst | 1 + docs/simulator.rst | 65 ++++++++++++ mypy.ini | 6 ++ requirements-simulator.txt | 2 + setup.py | 34 +++--- sql/updates/24.sql | 56 ++++++++++ swh/scheduler/backend.py | 62 +++++++++++ swh/scheduler/cli/__init__.py | 2 +- swh/scheduler/cli/simulator.py | 61 +++++++++++ swh/scheduler/interface.py | 31 ++++++ swh/scheduler/model.py | 32 ++++++ swh/scheduler/simulator/__init__.py | 144 ++++++++++++++++++++++++++ swh/scheduler/simulator/common.py | 102 ++++++++++++++++++ swh/scheduler/simulator/origin_scheduler.py | 68 ++++++++++++ swh/scheduler/simulator/origins.py | 128 +++++++++++++++++++++++ swh/scheduler/simulator/task_scheduler.py | 76 ++++++++++++++ swh/scheduler/sql/30-schema.sql | 24 ++++- swh/scheduler/sql/40-func.sql | 33 ++++++ swh/scheduler/tests/test_api_client.py | 2 + swh/scheduler/tests/test_scheduler.py | 155 +++++++++++++++++++++++++++- swh/scheduler/tests/test_simulator.py | 45 ++++++++ 22 files changed, 1110 insertions(+), 20 deletions(-) create mode 100644 docs/simulator.rst create mode 100644 requirements-simulator.txt create mode 100644 sql/updates/24.sql create mode 100644 swh/scheduler/cli/simulator.py create mode 100644 swh/scheduler/simulator/__init__.py create mode 100644 swh/scheduler/simulator/common.py create mode 100644 swh/scheduler/simulator/origin_scheduler.py create mode 100644 swh/scheduler/simulator/origins.py create mode 100644 swh/scheduler/simulator/task_scheduler.py create mode 100644 swh/scheduler/tests/test_simulator.py
Changes applied before test
commit c4a3a5444c7171b66f67212dd8a03581b39c1d57 Merge: 0a32a31 b0a3369 Author: Jenkins user <jenkins@localhost> Date: Tue Jan 19 17:12:16 2021 +0000 Merge branch 'diff-target' into HEAD commit b0a3369157c83dbb5e3dab0e7ef9e2803edbdefe Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jan 19 14:23:32 2021 +0100 Implement some basic aggregated metrics on listed origins Metrics are computed and cached database-side by the `update_metrics` function. The `get_metrics` function only retrieves the cached data. commit ed044415e625080cb4bc67b2656743d92ed4c884 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 16:32:27 2021 +0100 simulator: Add documentation. commit 186aebeb12905dc98cc370b360a8b3f5c4db3186 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 16:17:24 2021 +0100 simulator: Make min_batch_size a parameter defined in the setup. commit 6150c764616d3c25ee13eb08bea6b4c9d1c2bc0d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Mon Jan 18 13:51:35 2021 +0100 simulator: add basic tests for fill_test_data and run commit 5bee207dca74bb2c70611b3308c93bc522d48247 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:33:43 2021 +0100 simulator: implement a simulator for the "old" task-based scheduler We extend the Task object with an autogenerated uuid allowing us to track the task lifetime between its creation and the generation of visit statuses, as the task-based scheduler does. commit 6ec79c18b7b0e8be7b086aa79e87de81a8dbd06a Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:31:42 2021 +0100 Move the simulator cli to the main cli module commit b25874a7066c95460f7d24c132f32f4dabf055a7 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:37:59 2021 +0100 simulator: Replace attrs with dataclasses for consistency commit cb0bc27be55cf384c68b834ae3c89dd93434fbba Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:31:41 2021 +0100 simulator: wrap tasks and task events in typechecked objects This allows us to extend these objects without redefining a bunch of type annotations. commit 947aecb14cdb4c6dd2da178f53599b4a41c8245b Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 14:47:33 2021 +0100 simulator: also fill data for the task-based scheduler commit ff6cd0669e0d75afbd2c63424db66bf8d1e91bee Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Jan 15 14:41:05 2021 +0100 simulator: Split into smaller files in the same package commit b232135cb982f4fc8e5fb6242a88012d732e252d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:50:00 2021 +0100 simulator: Make the run time a CLI argument commit 24e93d8aa72107bf953f884df4c9b15ea9cbeb2c Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:40:16 2021 +0100 simulator: tweak simulation environment constants commit 9885d12cd708a26878cd9aa70ab590223589e8d7 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:37:00 2021 +0100 simulator: generate more origins in fill_data commit a1d80fec0f5760d136857fb893232b1baec35b64 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:35:01 2021 +0100 simulator: add typing for Environment.scheduler commit 6a9ec5f38133fe232da1ca98ff30ef44b12a4c12 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:00:21 2021 +0100 simulator: add support for a basic SimulationReport For now, this collects the runtime of tasks that have run, and gets printed at the end of the simulation. commit 5d0e2aee4182df9476934349ad20da5dafc8b61f Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:45:23 2021 +0100 simulator: refine origin model to follow an exponential distribution This models origins using a consistent characteristic "time between commits" that follows an exponential distribution between 1 second and 10 years. From this characteristic time, and feedback from the OriginVisitStats, we can generate the expected run time and output status of the next visit of that origin. commit 7934c2f90191615db69b50dc27744ec73704f896 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:43:20 2021 +0100 simulator: Remove some debug statements and lower log level commit d0ed751eca9f2ff0464b795edb9e9bb2a0305649 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:17:11 2021 +0100 simulator: simulate the scheduler journal client commit c9b0728955e683748f8b03a22f91d501b64aad67 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:12:38 2021 +0100 simulator: generate OriginVisitStatus objects in modeled visits To be able to generate uneventful visits, we would need to store the last snapshot seen for a given origin. Instead of storing this within the simulator, which would be a concern for large scale simulations, we use the scheduler visit cache directly. commit 56c8d1dd66d8a993c8bc7c7bcc4e3fb3704f6864 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:09:58 2021 +0100 simulator: Move scheduler into the simulation environment object The scheduler is used by a lot of the simulated actors, it makes sense to share it all the time. commit 1659aa17fe0510030fb24d3b7867d2c4a366b5dd Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:07:56 2021 +0100 simulator: Use datetimes instead of a floating point simulated time commit ef241dd84c400f9be0d92396867587d47216e385 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 13 16:13:01 2021 +0100 Introduce scaffolding for a scheduler simulator This simulator will allow us to compare the behavior of the old and new schedulers, as well as to test the impact of scheduler policies and their parameters on the performance of the Software Heritage archival infrastructure as a whole. commit 49a14792b0329049b51cbc6ed9c48006e9ff1a73 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jan 19 17:56:44 2021 +0100 Import the journal subcommand in the main swh.scheduler cli This issue was masked by tox.ini using pytest with --doctest-modules, which imports all modules during test collection, and therefore executing the side-effects of swh.scheduler.cli.journal.
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/167/ for more details.
Build is green
Patch application report for D4880 (id=17360)
Could not rebase; Attempt merge onto 0a32a31195...
Auto-merging setup.py Merge made by the 'recursive' strategy. .pre-commit-config.yaml | 1 + docs/index.rst | 1 + docs/simulator.rst | 65 +++++++++++ mypy.ini | 6 + requirements-simulator.txt | 2 + setup.py | 34 +++--- sql/updates/24.sql | 64 +++++++++++ swh/scheduler/backend.py | 62 +++++++++++ swh/scheduler/cli/__init__.py | 2 +- swh/scheduler/cli/simulator.py | 61 ++++++++++ swh/scheduler/interface.py | 31 ++++++ swh/scheduler/model.py | 32 ++++++ swh/scheduler/simulator/__init__.py | 144 ++++++++++++++++++++++++ swh/scheduler/simulator/common.py | 102 +++++++++++++++++ swh/scheduler/simulator/origin_scheduler.py | 68 ++++++++++++ swh/scheduler/simulator/origins.py | 128 +++++++++++++++++++++ swh/scheduler/simulator/task_scheduler.py | 76 +++++++++++++ swh/scheduler/sql/30-schema.sql | 24 +++- swh/scheduler/sql/40-func.sql | 40 +++++++ swh/scheduler/tests/test_api_client.py | 2 + swh/scheduler/tests/test_scheduler.py | 166 +++++++++++++++++++++++++++- swh/scheduler/tests/test_simulator.py | 45 ++++++++ 22 files changed, 1136 insertions(+), 20 deletions(-) create mode 100644 docs/simulator.rst create mode 100644 requirements-simulator.txt create mode 100644 sql/updates/24.sql create mode 100644 swh/scheduler/cli/simulator.py create mode 100644 swh/scheduler/simulator/__init__.py create mode 100644 swh/scheduler/simulator/common.py create mode 100644 swh/scheduler/simulator/origin_scheduler.py create mode 100644 swh/scheduler/simulator/origins.py create mode 100644 swh/scheduler/simulator/task_scheduler.py create mode 100644 swh/scheduler/tests/test_simulator.py
Changes applied before test
commit a3865fa4dad3ff3b8d4f0feff3757a0f46b0512f Merge: 0a32a31 071671a Author: Jenkins user <jenkins@localhost> Date: Tue Jan 19 17:18:25 2021 +0000 Merge branch 'diff-target' into HEAD commit 071671aae22e726bfce5a00ce2e1c6ddfe850d33 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jan 19 14:23:32 2021 +0100 Implement some basic aggregated metrics on listed origins Metrics are computed and cached database-side by the `update_metrics` function. The `get_metrics` function only retrieves the cached data. commit ed044415e625080cb4bc67b2656743d92ed4c884 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 16:32:27 2021 +0100 simulator: Add documentation. commit 186aebeb12905dc98cc370b360a8b3f5c4db3186 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 16:17:24 2021 +0100 simulator: Make min_batch_size a parameter defined in the setup. commit 6150c764616d3c25ee13eb08bea6b4c9d1c2bc0d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Mon Jan 18 13:51:35 2021 +0100 simulator: add basic tests for fill_test_data and run commit 5bee207dca74bb2c70611b3308c93bc522d48247 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:33:43 2021 +0100 simulator: implement a simulator for the "old" task-based scheduler We extend the Task object with an autogenerated uuid allowing us to track the task lifetime between its creation and the generation of visit statuses, as the task-based scheduler does. commit 6ec79c18b7b0e8be7b086aa79e87de81a8dbd06a Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:31:42 2021 +0100 Move the simulator cli to the main cli module commit b25874a7066c95460f7d24c132f32f4dabf055a7 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:37:59 2021 +0100 simulator: Replace attrs with dataclasses for consistency commit cb0bc27be55cf384c68b834ae3c89dd93434fbba Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:31:41 2021 +0100 simulator: wrap tasks and task events in typechecked objects This allows us to extend these objects without redefining a bunch of type annotations. commit 947aecb14cdb4c6dd2da178f53599b4a41c8245b Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 14:47:33 2021 +0100 simulator: also fill data for the task-based scheduler commit ff6cd0669e0d75afbd2c63424db66bf8d1e91bee Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Jan 15 14:41:05 2021 +0100 simulator: Split into smaller files in the same package commit b232135cb982f4fc8e5fb6242a88012d732e252d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:50:00 2021 +0100 simulator: Make the run time a CLI argument commit 24e93d8aa72107bf953f884df4c9b15ea9cbeb2c Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:40:16 2021 +0100 simulator: tweak simulation environment constants commit 9885d12cd708a26878cd9aa70ab590223589e8d7 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:37:00 2021 +0100 simulator: generate more origins in fill_data commit a1d80fec0f5760d136857fb893232b1baec35b64 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:35:01 2021 +0100 simulator: add typing for Environment.scheduler commit 6a9ec5f38133fe232da1ca98ff30ef44b12a4c12 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:00:21 2021 +0100 simulator: add support for a basic SimulationReport For now, this collects the runtime of tasks that have run, and gets printed at the end of the simulation. commit 5d0e2aee4182df9476934349ad20da5dafc8b61f Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:45:23 2021 +0100 simulator: refine origin model to follow an exponential distribution This models origins using a consistent characteristic "time between commits" that follows an exponential distribution between 1 second and 10 years. From this characteristic time, and feedback from the OriginVisitStats, we can generate the expected run time and output status of the next visit of that origin. commit 7934c2f90191615db69b50dc27744ec73704f896 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:43:20 2021 +0100 simulator: Remove some debug statements and lower log level commit d0ed751eca9f2ff0464b795edb9e9bb2a0305649 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:17:11 2021 +0100 simulator: simulate the scheduler journal client commit c9b0728955e683748f8b03a22f91d501b64aad67 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:12:38 2021 +0100 simulator: generate OriginVisitStatus objects in modeled visits To be able to generate uneventful visits, we would need to store the last snapshot seen for a given origin. Instead of storing this within the simulator, which would be a concern for large scale simulations, we use the scheduler visit cache directly. commit 56c8d1dd66d8a993c8bc7c7bcc4e3fb3704f6864 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:09:58 2021 +0100 simulator: Move scheduler into the simulation environment object The scheduler is used by a lot of the simulated actors, it makes sense to share it all the time. commit 1659aa17fe0510030fb24d3b7867d2c4a366b5dd Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:07:56 2021 +0100 simulator: Use datetimes instead of a floating point simulated time commit ef241dd84c400f9be0d92396867587d47216e385 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 13 16:13:01 2021 +0100 Introduce scaffolding for a scheduler simulator This simulator will allow us to compare the behavior of the old and new schedulers, as well as to test the impact of scheduler policies and their parameters on the performance of the Software Heritage archival infrastructure as a whole. commit 49a14792b0329049b51cbc6ed9c48006e9ff1a73 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jan 19 17:56:44 2021 +0100 Import the journal subcommand in the main swh.scheduler cli This issue was masked by tox.ini using pytest with --doctest-modules, which imports all modules during test collection, and therefore executing the side-effects of swh.scheduler.cli.journal.
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/169/ for more details.
Build is green
Patch application report for D4880 (id=17363)
Could not rebase; Attempt merge onto 0a32a31195...
Auto-merging setup.py Merge made by the 'recursive' strategy. .pre-commit-config.yaml | 1 + docs/index.rst | 1 + docs/simulator.rst | 65 +++++++++++ mypy.ini | 6 + requirements-simulator.txt | 2 + setup.py | 34 +++--- sql/updates/24.sql | 64 +++++++++++ swh/scheduler/backend.py | 62 +++++++++++ swh/scheduler/cli/__init__.py | 2 +- swh/scheduler/cli/simulator.py | 68 ++++++++++++ swh/scheduler/interface.py | 31 ++++++ swh/scheduler/model.py | 32 ++++++ swh/scheduler/simulator/__init__.py | 147 ++++++++++++++++++++++++ swh/scheduler/simulator/common.py | 102 +++++++++++++++++ swh/scheduler/simulator/origin_scheduler.py | 68 ++++++++++++ swh/scheduler/simulator/origins.py | 128 +++++++++++++++++++++ swh/scheduler/simulator/task_scheduler.py | 76 +++++++++++++ swh/scheduler/sql/30-schema.sql | 24 +++- swh/scheduler/sql/40-func.sql | 40 +++++++ swh/scheduler/tests/test_api_client.py | 2 + swh/scheduler/tests/test_scheduler.py | 166 +++++++++++++++++++++++++++- swh/scheduler/tests/test_simulator.py | 53 +++++++++ 22 files changed, 1154 insertions(+), 20 deletions(-) create mode 100644 docs/simulator.rst create mode 100644 requirements-simulator.txt create mode 100644 sql/updates/24.sql create mode 100644 swh/scheduler/cli/simulator.py create mode 100644 swh/scheduler/simulator/__init__.py create mode 100644 swh/scheduler/simulator/common.py create mode 100644 swh/scheduler/simulator/origin_scheduler.py create mode 100644 swh/scheduler/simulator/origins.py create mode 100644 swh/scheduler/simulator/task_scheduler.py create mode 100644 swh/scheduler/tests/test_simulator.py
Changes applied before test
commit 459a5b2a3bd9974e3b1f9d0eb8d79a5cc076c798 Merge: 0a32a31 2bf1109 Author: Jenkins user <jenkins@localhost> Date: Tue Jan 19 17:44:49 2021 +0000 Merge branch 'diff-target' into HEAD commit 2bf1109c21ac77b7f03ad993f4cf0650390b929b Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jan 19 14:23:32 2021 +0100 Implement some basic aggregated metrics on listed origins Metrics are computed and cached database-side by the `update_metrics` function. The `get_metrics` function only retrieves the cached data. commit e12a4f13386cdb25d366f5e2ee81044cb8e30169 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 18:36:53 2021 +0100 simulator: stop using get_scheduler directly This reuses the scheduler instantiated by the cli instead of hardcoding our own using the PG* variables. commit ed044415e625080cb4bc67b2656743d92ed4c884 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 16:32:27 2021 +0100 simulator: Add documentation. commit 186aebeb12905dc98cc370b360a8b3f5c4db3186 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 16:17:24 2021 +0100 simulator: Make min_batch_size a parameter defined in the setup. commit 6150c764616d3c25ee13eb08bea6b4c9d1c2bc0d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Mon Jan 18 13:51:35 2021 +0100 simulator: add basic tests for fill_test_data and run commit 5bee207dca74bb2c70611b3308c93bc522d48247 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:33:43 2021 +0100 simulator: implement a simulator for the "old" task-based scheduler We extend the Task object with an autogenerated uuid allowing us to track the task lifetime between its creation and the generation of visit statuses, as the task-based scheduler does. commit 6ec79c18b7b0e8be7b086aa79e87de81a8dbd06a Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:31:42 2021 +0100 Move the simulator cli to the main cli module commit b25874a7066c95460f7d24c132f32f4dabf055a7 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:37:59 2021 +0100 simulator: Replace attrs with dataclasses for consistency commit cb0bc27be55cf384c68b834ae3c89dd93434fbba Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:31:41 2021 +0100 simulator: wrap tasks and task events in typechecked objects This allows us to extend these objects without redefining a bunch of type annotations. commit 947aecb14cdb4c6dd2da178f53599b4a41c8245b Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 14:47:33 2021 +0100 simulator: also fill data for the task-based scheduler commit ff6cd0669e0d75afbd2c63424db66bf8d1e91bee Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Jan 15 14:41:05 2021 +0100 simulator: Split into smaller files in the same package commit b232135cb982f4fc8e5fb6242a88012d732e252d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:50:00 2021 +0100 simulator: Make the run time a CLI argument commit 24e93d8aa72107bf953f884df4c9b15ea9cbeb2c Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:40:16 2021 +0100 simulator: tweak simulation environment constants commit 9885d12cd708a26878cd9aa70ab590223589e8d7 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:37:00 2021 +0100 simulator: generate more origins in fill_data commit a1d80fec0f5760d136857fb893232b1baec35b64 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:35:01 2021 +0100 simulator: add typing for Environment.scheduler commit 6a9ec5f38133fe232da1ca98ff30ef44b12a4c12 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:00:21 2021 +0100 simulator: add support for a basic SimulationReport For now, this collects the runtime of tasks that have run, and gets printed at the end of the simulation. commit 5d0e2aee4182df9476934349ad20da5dafc8b61f Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:45:23 2021 +0100 simulator: refine origin model to follow an exponential distribution This models origins using a consistent characteristic "time between commits" that follows an exponential distribution between 1 second and 10 years. From this characteristic time, and feedback from the OriginVisitStats, we can generate the expected run time and output status of the next visit of that origin. commit 7934c2f90191615db69b50dc27744ec73704f896 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:43:20 2021 +0100 simulator: Remove some debug statements and lower log level commit d0ed751eca9f2ff0464b795edb9e9bb2a0305649 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:17:11 2021 +0100 simulator: simulate the scheduler journal client commit c9b0728955e683748f8b03a22f91d501b64aad67 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:12:38 2021 +0100 simulator: generate OriginVisitStatus objects in modeled visits To be able to generate uneventful visits, we would need to store the last snapshot seen for a given origin. Instead of storing this within the simulator, which would be a concern for large scale simulations, we use the scheduler visit cache directly. commit 56c8d1dd66d8a993c8bc7c7bcc4e3fb3704f6864 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:09:58 2021 +0100 simulator: Move scheduler into the simulation environment object The scheduler is used by a lot of the simulated actors, it makes sense to share it all the time. commit 1659aa17fe0510030fb24d3b7867d2c4a366b5dd Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:07:56 2021 +0100 simulator: Use datetimes instead of a floating point simulated time commit ef241dd84c400f9be0d92396867587d47216e385 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 13 16:13:01 2021 +0100 Introduce scaffolding for a scheduler simulator This simulator will allow us to compare the behavior of the old and new schedulers, as well as to test the impact of scheduler policies and their parameters on the performance of the Software Heritage archival infrastructure as a whole. commit 49a14792b0329049b51cbc6ed9c48006e9ff1a73 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jan 19 17:56:44 2021 +0100 Import the journal subcommand in the main swh.scheduler cli This issue was masked by tox.ini using pytest with --doctest-modules, which imports all modules during test collection, and therefore executing the side-effects of swh.scheduler.cli.journal.
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/172/ for more details.
Looks ok to me. I'd like however to have a description of implemented metrics in the commit message (and in the documentation, but this may come later)
sql/updates/24.sql | ||
---|---|---|
16 | I would not call this a cache, but meh | |
22 | "never been successfully visited" | |
swh/scheduler/sql/30-schema.sql | ||
200 | see comment on 24.sql | |
206 | see comment on 24.sql |
sql/updates/24.sql | ||
---|---|---|
16 | It's a snapshot of said metrics which we could compute on the fly but don't because that takes a long time. I'm not sure how else to call it. |
Build is green
Patch application report for D4880 (id=17378)
Rebasing onto 98526539a8...
Current branch diff-target is up to date.
Changes applied before test
commit 114ed952e513c7ad3dbb038a640e80bf079d0780 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jan 19 14:23:32 2021 +0100 Implement some basic aggregated metrics on listed origins Metrics are computed and cached database-side by the `update_metrics` function. The `get_metrics` function only retrieves the cached data. The metrics are aggregated for each lister instance and visit type (allowing complete reaggregation by visit type for cross-cutting statistics). The following metrics have been implemented: - number of known origins overall - number of enabled origins (origins seen in the last listing) - number of enabled origins that have never been successfully visited - number of enabled origins with known activity since our last successful visit
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/178/ for more details.