I knew we would need it eventually -- @olasd
Details
Details
- Reviewers
olasd ardumont - Group Reviewers
Reviewers - Commits
- rDSCH737d12e5b9e6: Introduce a new lister_get endpoint
new tests added
Diff Detail
Diff Detail
- Repository
- rDSCH Scheduling utilities
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 18500 Build 28613: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 28612: arc lint + arc unit
Event Timeline
Comment Actions
Build is green
Patch application report for D4887 (id=17359)
Could not rebase; Attempt merge onto 0a32a31195...
Auto-merging setup.py Merge made by the 'recursive' strategy. .pre-commit-config.yaml | 1 + docs/index.rst | 1 + docs/simulator.rst | 65 +++++++++++ mypy.ini | 6 + requirements-simulator.txt | 2 + setup.py | 34 +++--- sql/updates/24.sql | 56 +++++++++ swh/scheduler/backend.py | 87 ++++++++++++++ swh/scheduler/cli/__init__.py | 2 +- swh/scheduler/cli/simulator.py | 61 ++++++++++ swh/scheduler/interface.py | 40 +++++++ swh/scheduler/model.py | 32 ++++++ swh/scheduler/simulator/__init__.py | 144 +++++++++++++++++++++++ swh/scheduler/simulator/common.py | 102 +++++++++++++++++ swh/scheduler/simulator/origin_scheduler.py | 68 +++++++++++ swh/scheduler/simulator/origins.py | 128 +++++++++++++++++++++ swh/scheduler/simulator/task_scheduler.py | 76 +++++++++++++ swh/scheduler/sql/30-schema.sql | 24 +++- swh/scheduler/sql/40-func.sql | 33 ++++++ swh/scheduler/tests/test_api_client.py | 3 + swh/scheduler/tests/test_scheduler.py | 170 +++++++++++++++++++++++++++- swh/scheduler/tests/test_simulator.py | 45 ++++++++ 22 files changed, 1160 insertions(+), 20 deletions(-) create mode 100644 docs/simulator.rst create mode 100644 requirements-simulator.txt create mode 100644 sql/updates/24.sql create mode 100644 swh/scheduler/cli/simulator.py create mode 100644 swh/scheduler/simulator/__init__.py create mode 100644 swh/scheduler/simulator/common.py create mode 100644 swh/scheduler/simulator/origin_scheduler.py create mode 100644 swh/scheduler/simulator/origins.py create mode 100644 swh/scheduler/simulator/task_scheduler.py create mode 100644 swh/scheduler/tests/test_simulator.py
Changes applied before test
commit 88990d1a1ef402a0583856be243b80302998fb9e Merge: 0a32a31 867898a Author: Jenkins user <jenkins@localhost> Date: Tue Jan 19 17:14:41 2021 +0000 Merge branch 'diff-target' into HEAD commit 867898ae3cf21714acfacb317bae0d9ca963ca59 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jan 19 17:48:31 2021 +0100 Introduce a new lister_get endpoint commit b0a3369157c83dbb5e3dab0e7ef9e2803edbdefe Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jan 19 14:23:32 2021 +0100 Implement some basic aggregated metrics on listed origins Metrics are computed and cached database-side by the `update_metrics` function. The `get_metrics` function only retrieves the cached data. commit ed044415e625080cb4bc67b2656743d92ed4c884 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 16:32:27 2021 +0100 simulator: Add documentation. commit 186aebeb12905dc98cc370b360a8b3f5c4db3186 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 16:17:24 2021 +0100 simulator: Make min_batch_size a parameter defined in the setup. commit 6150c764616d3c25ee13eb08bea6b4c9d1c2bc0d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Mon Jan 18 13:51:35 2021 +0100 simulator: add basic tests for fill_test_data and run commit 5bee207dca74bb2c70611b3308c93bc522d48247 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:33:43 2021 +0100 simulator: implement a simulator for the "old" task-based scheduler We extend the Task object with an autogenerated uuid allowing us to track the task lifetime between its creation and the generation of visit statuses, as the task-based scheduler does. commit 6ec79c18b7b0e8be7b086aa79e87de81a8dbd06a Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:31:42 2021 +0100 Move the simulator cli to the main cli module commit b25874a7066c95460f7d24c132f32f4dabf055a7 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:37:59 2021 +0100 simulator: Replace attrs with dataclasses for consistency commit cb0bc27be55cf384c68b834ae3c89dd93434fbba Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:31:41 2021 +0100 simulator: wrap tasks and task events in typechecked objects This allows us to extend these objects without redefining a bunch of type annotations. commit 947aecb14cdb4c6dd2da178f53599b4a41c8245b Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 14:47:33 2021 +0100 simulator: also fill data for the task-based scheduler commit ff6cd0669e0d75afbd2c63424db66bf8d1e91bee Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Jan 15 14:41:05 2021 +0100 simulator: Split into smaller files in the same package commit b232135cb982f4fc8e5fb6242a88012d732e252d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:50:00 2021 +0100 simulator: Make the run time a CLI argument commit 24e93d8aa72107bf953f884df4c9b15ea9cbeb2c Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:40:16 2021 +0100 simulator: tweak simulation environment constants commit 9885d12cd708a26878cd9aa70ab590223589e8d7 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:37:00 2021 +0100 simulator: generate more origins in fill_data commit a1d80fec0f5760d136857fb893232b1baec35b64 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:35:01 2021 +0100 simulator: add typing for Environment.scheduler commit 6a9ec5f38133fe232da1ca98ff30ef44b12a4c12 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:00:21 2021 +0100 simulator: add support for a basic SimulationReport For now, this collects the runtime of tasks that have run, and gets printed at the end of the simulation. commit 5d0e2aee4182df9476934349ad20da5dafc8b61f Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:45:23 2021 +0100 simulator: refine origin model to follow an exponential distribution This models origins using a consistent characteristic "time between commits" that follows an exponential distribution between 1 second and 10 years. From this characteristic time, and feedback from the OriginVisitStats, we can generate the expected run time and output status of the next visit of that origin. commit 7934c2f90191615db69b50dc27744ec73704f896 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:43:20 2021 +0100 simulator: Remove some debug statements and lower log level commit d0ed751eca9f2ff0464b795edb9e9bb2a0305649 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:17:11 2021 +0100 simulator: simulate the scheduler journal client commit c9b0728955e683748f8b03a22f91d501b64aad67 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:12:38 2021 +0100 simulator: generate OriginVisitStatus objects in modeled visits To be able to generate uneventful visits, we would need to store the last snapshot seen for a given origin. Instead of storing this within the simulator, which would be a concern for large scale simulations, we use the scheduler visit cache directly. commit 56c8d1dd66d8a993c8bc7c7bcc4e3fb3704f6864 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:09:58 2021 +0100 simulator: Move scheduler into the simulation environment object The scheduler is used by a lot of the simulated actors, it makes sense to share it all the time. commit 1659aa17fe0510030fb24d3b7867d2c4a366b5dd Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:07:56 2021 +0100 simulator: Use datetimes instead of a floating point simulated time commit ef241dd84c400f9be0d92396867587d47216e385 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 13 16:13:01 2021 +0100 Introduce scaffolding for a scheduler simulator This simulator will allow us to compare the behavior of the old and new schedulers, as well as to test the impact of scheduler policies and their parameters on the performance of the Software Heritage archival infrastructure as a whole. commit 49a14792b0329049b51cbc6ed9c48006e9ff1a73 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jan 19 17:56:44 2021 +0100 Import the journal subcommand in the main swh.scheduler cli This issue was masked by tox.ini using pytest with --doctest-modules, which imports all modules during test collection, and therefore executing the side-effects of swh.scheduler.cli.journal.
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/168/ for more details.
Comment Actions
Build is green
Patch application report for D4887 (id=17364)
Could not rebase; Attempt merge onto 0a32a31195...
Auto-merging setup.py Merge made by the 'recursive' strategy. .pre-commit-config.yaml | 1 + docs/index.rst | 1 + docs/simulator.rst | 65 ++++++++++ mypy.ini | 6 + requirements-simulator.txt | 2 + setup.py | 34 +++--- sql/updates/24.sql | 64 ++++++++++ swh/scheduler/backend.py | 87 +++++++++++++ swh/scheduler/cli/__init__.py | 2 +- swh/scheduler/cli/simulator.py | 68 +++++++++++ swh/scheduler/interface.py | 40 ++++++ swh/scheduler/model.py | 32 +++++ swh/scheduler/simulator/__init__.py | 147 ++++++++++++++++++++++ swh/scheduler/simulator/common.py | 102 ++++++++++++++++ swh/scheduler/simulator/origin_scheduler.py | 68 +++++++++++ swh/scheduler/simulator/origins.py | 128 ++++++++++++++++++++ swh/scheduler/simulator/task_scheduler.py | 76 ++++++++++++ swh/scheduler/sql/30-schema.sql | 24 +++- swh/scheduler/sql/40-func.sql | 40 ++++++ swh/scheduler/tests/test_api_client.py | 3 + swh/scheduler/tests/test_scheduler.py | 181 +++++++++++++++++++++++++++- swh/scheduler/tests/test_simulator.py | 53 ++++++++ 22 files changed, 1204 insertions(+), 20 deletions(-) create mode 100644 docs/simulator.rst create mode 100644 requirements-simulator.txt create mode 100644 sql/updates/24.sql create mode 100644 swh/scheduler/cli/simulator.py create mode 100644 swh/scheduler/simulator/__init__.py create mode 100644 swh/scheduler/simulator/common.py create mode 100644 swh/scheduler/simulator/origin_scheduler.py create mode 100644 swh/scheduler/simulator/origins.py create mode 100644 swh/scheduler/simulator/task_scheduler.py create mode 100644 swh/scheduler/tests/test_simulator.py
Changes applied before test
commit dbeccbfb878a81125372d4d85f8fbedeb3145c1c Merge: 0a32a31 191ec9d Author: Jenkins user <jenkins@localhost> Date: Tue Jan 19 17:47:13 2021 +0000 Merge branch 'diff-target' into HEAD commit 191ec9d9874c335b7ce10958766b450d054a74a4 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jan 19 17:48:31 2021 +0100 Introduce a new lister_get endpoint commit 2bf1109c21ac77b7f03ad993f4cf0650390b929b Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jan 19 14:23:32 2021 +0100 Implement some basic aggregated metrics on listed origins Metrics are computed and cached database-side by the `update_metrics` function. The `get_metrics` function only retrieves the cached data. commit e12a4f13386cdb25d366f5e2ee81044cb8e30169 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 18:36:53 2021 +0100 simulator: stop using get_scheduler directly This reuses the scheduler instantiated by the cli instead of hardcoding our own using the PG* variables. commit ed044415e625080cb4bc67b2656743d92ed4c884 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 16:32:27 2021 +0100 simulator: Add documentation. commit 186aebeb12905dc98cc370b360a8b3f5c4db3186 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 16:17:24 2021 +0100 simulator: Make min_batch_size a parameter defined in the setup. commit 6150c764616d3c25ee13eb08bea6b4c9d1c2bc0d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Mon Jan 18 13:51:35 2021 +0100 simulator: add basic tests for fill_test_data and run commit 5bee207dca74bb2c70611b3308c93bc522d48247 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:33:43 2021 +0100 simulator: implement a simulator for the "old" task-based scheduler We extend the Task object with an autogenerated uuid allowing us to track the task lifetime between its creation and the generation of visit statuses, as the task-based scheduler does. commit 6ec79c18b7b0e8be7b086aa79e87de81a8dbd06a Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:31:42 2021 +0100 Move the simulator cli to the main cli module commit b25874a7066c95460f7d24c132f32f4dabf055a7 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:37:59 2021 +0100 simulator: Replace attrs with dataclasses for consistency commit cb0bc27be55cf384c68b834ae3c89dd93434fbba Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:31:41 2021 +0100 simulator: wrap tasks and task events in typechecked objects This allows us to extend these objects without redefining a bunch of type annotations. commit 947aecb14cdb4c6dd2da178f53599b4a41c8245b Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 14:47:33 2021 +0100 simulator: also fill data for the task-based scheduler commit ff6cd0669e0d75afbd2c63424db66bf8d1e91bee Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Jan 15 14:41:05 2021 +0100 simulator: Split into smaller files in the same package commit b232135cb982f4fc8e5fb6242a88012d732e252d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:50:00 2021 +0100 simulator: Make the run time a CLI argument commit 24e93d8aa72107bf953f884df4c9b15ea9cbeb2c Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:40:16 2021 +0100 simulator: tweak simulation environment constants commit 9885d12cd708a26878cd9aa70ab590223589e8d7 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:37:00 2021 +0100 simulator: generate more origins in fill_data commit a1d80fec0f5760d136857fb893232b1baec35b64 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:35:01 2021 +0100 simulator: add typing for Environment.scheduler commit 6a9ec5f38133fe232da1ca98ff30ef44b12a4c12 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:00:21 2021 +0100 simulator: add support for a basic SimulationReport For now, this collects the runtime of tasks that have run, and gets printed at the end of the simulation. commit 5d0e2aee4182df9476934349ad20da5dafc8b61f Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:45:23 2021 +0100 simulator: refine origin model to follow an exponential distribution This models origins using a consistent characteristic "time between commits" that follows an exponential distribution between 1 second and 10 years. From this characteristic time, and feedback from the OriginVisitStats, we can generate the expected run time and output status of the next visit of that origin. commit 7934c2f90191615db69b50dc27744ec73704f896 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:43:20 2021 +0100 simulator: Remove some debug statements and lower log level commit d0ed751eca9f2ff0464b795edb9e9bb2a0305649 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:17:11 2021 +0100 simulator: simulate the scheduler journal client commit c9b0728955e683748f8b03a22f91d501b64aad67 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:12:38 2021 +0100 simulator: generate OriginVisitStatus objects in modeled visits To be able to generate uneventful visits, we would need to store the last snapshot seen for a given origin. Instead of storing this within the simulator, which would be a concern for large scale simulations, we use the scheduler visit cache directly. commit 56c8d1dd66d8a993c8bc7c7bcc4e3fb3704f6864 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:09:58 2021 +0100 simulator: Move scheduler into the simulation environment object The scheduler is used by a lot of the simulated actors, it makes sense to share it all the time. commit 1659aa17fe0510030fb24d3b7867d2c4a366b5dd Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:07:56 2021 +0100 simulator: Use datetimes instead of a floating point simulated time commit ef241dd84c400f9be0d92396867587d47216e385 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 13 16:13:01 2021 +0100 Introduce scaffolding for a scheduler simulator This simulator will allow us to compare the behavior of the old and new schedulers, as well as to test the impact of scheduler policies and their parameters on the performance of the Software Heritage archival infrastructure as a whole. commit 49a14792b0329049b51cbc6ed9c48006e9ff1a73 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jan 19 17:56:44 2021 +0100 Import the journal subcommand in the main swh.scheduler cli This issue was masked by tox.ini using pytest with --doctest-modules, which imports all modules during test collection, and therefore executing the side-effects of swh.scheduler.cli.journal.
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/173/ for more details.
Comment Actions
Build is green
Patch application report for D4887 (id=17379)
Could not rebase; Attempt merge onto 98526539a8...
Updating 9852653..737d12e Fast-forward sql/updates/25.sql | 64 ++++++++++++ swh/scheduler/backend.py | 87 ++++++++++++++++ swh/scheduler/interface.py | 40 ++++++++ swh/scheduler/model.py | 32 ++++++ swh/scheduler/sql/30-schema.sql | 24 ++++- swh/scheduler/sql/40-func.sql | 40 ++++++++ swh/scheduler/tests/test_api_client.py | 3 + swh/scheduler/tests/test_scheduler.py | 181 ++++++++++++++++++++++++++++++++- 8 files changed, 468 insertions(+), 3 deletions(-) create mode 100644 sql/updates/25.sql
Changes applied before test
commit 737d12e5b9e694b22bef291c625090fb3aee2afc Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jan 19 17:48:31 2021 +0100 Introduce a new lister_get endpoint commit 114ed952e513c7ad3dbb038a640e80bf079d0780 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jan 19 14:23:32 2021 +0100 Implement some basic aggregated metrics on listed origins Metrics are computed and cached database-side by the `update_metrics` function. The `get_metrics` function only retrieves the cached data. The metrics are aggregated for each lister instance and visit type (allowing complete reaggregation by visit type for cross-cutting statistics). The following metrics have been implemented: - number of known origins overall - number of enabled origins (origins seen in the last listing) - number of enabled origins that have never been successfully visited - number of enabled origins with known activity since our last successful visit
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/179/ for more details.