This simulator will allow us to compare the behavior of the old and new
schedulers, as well as to test the impact of scheduler policies and their
parameters on the performance of the Software Heritage archival
infrastructure as a whole.
Details
- Reviewers
olasd douardda - Group Reviewers
Reviewers - Maniphest Tasks
- T2973: Implement a scheduler simulator
- Commits
- rDSCH9468bb9384f1: simulator: add basic tests for fill_test_data and run
rDSCH88e0b4280501: simulator: Add documentation.
rDSCH898820fac52c: simulator: collect and plot scheduler metrics over time
rDSCH9ce68f8d0e0e: simulator: stop using get_scheduler directly
rDSCH62c6d90867bc: simulator: Make min_batch_size a parameter defined in the setup.
rDSCHead7b347db9d: simulator: implement a simulator for the "old" task-based scheduler
rDSCHaecd27eee06a: Move the simulator cli to the main cli module
rDSCH05067e3ecc88: simulator: Replace attrs with dataclasses for consistency
rDSCH24922fe2d995: simulator: wrap tasks and task events in typechecked objects
rDSCH22ebb7a9a4bc: simulator: Split into smaller files in the same package
rDSCHd5318aea0a93: simulator: also fill data for the task-based scheduler
rDSCH29204199774b: simulator: add typing for Environment.scheduler
rDSCHad7bfbe731da: simulator: Make the run time a CLI argument
rDSCHdf34db0bfc61: simulator: tweak simulation environment constants
rDSCH21ce2c88dddc: simulator: generate more origins in fill_data
rDSCH6433266106dd: simulator: add support for a basic SimulationReport
rDSCHc474a825336a: simulator: refine origin model to follow an exponential distribution
rDSCHcb12449e8f57: simulator: simulate the scheduler journal client
rDSCH20b7f9c68f83: simulator: generate OriginVisitStatus objects in modeled visits
rDSCH2459badf0c05: simulator: Remove some debug statements and lower log level
rDSCH39ad47de2e75: simulator: Move scheduler into the simulation environment object
rDSCH31967fa850c3: simulator: Use datetimes instead of a floating point simulated time
rDSCHfc3f06bd1d77: Introduce scaffolding for a scheduler simulator
use the docs, Luke
Diff Detail
- Repository
- rDSCH Scheduling utilities
- Branch
- scheduling-policy
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 18445 Build 28517: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 28516: arc lint + arc unit
Event Timeline
Build is green
Patch application report for D4856 (id=17208)
Could not rebase; Attempt merge onto a62003397d...
Updating a620033..dfa0aee Fast-forward .pre-commit-config.yaml | 1 + docs/index.rst | 1 + docs/simulator.rst | 55 +++++++++++++ mypy.ini | 3 + requirements-simulator.txt | 1 + setup.py | 5 +- sql/updates/20.sql | 6 ++ swh/scheduler/backend.py | 9 ++- swh/scheduler/cli/__init__.py | 7 +- swh/scheduler/cli/origin.py | 141 ++++++++++++++++++++++++++++++++ swh/scheduler/interface.py | 8 +- swh/scheduler/model.py | 9 +++ swh/scheduler/simulator/__init__.py | 144 +++++++++++++++++++++++++++++++++ swh/scheduler/simulator/__main__.py | 31 +++++++ swh/scheduler/sql/30-schema.sql | 2 +- swh/scheduler/sql/60-indexes.sql | 2 +- swh/scheduler/tests/common.py | 10 +-- swh/scheduler/tests/conftest.py | 45 ++++++++--- swh/scheduler/tests/test_cli_origin.py | 112 +++++++++++++++++++++++++ swh/scheduler/tests/test_model.py | 31 ++++++- swh/scheduler/tests/test_scheduler.py | 27 ++++--- 21 files changed, 611 insertions(+), 39 deletions(-) create mode 100644 docs/simulator.rst create mode 100644 requirements-simulator.txt create mode 100644 sql/updates/20.sql create mode 100644 swh/scheduler/cli/origin.py create mode 100644 swh/scheduler/simulator/__init__.py create mode 100644 swh/scheduler/simulator/__main__.py create mode 100644 swh/scheduler/tests/test_cli_origin.py
Changes applied before test
commit dfa0aee33500715f47b2e228c5462153d101a5b5 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 13 16:13:01 2021 +0100 Introduce scaffolding for a scheduler simulator This simulator will allow us to compare the behavior of the old and new schedulers, as well as to test the impact of scheduler policies and their parameters on the performance of the Software Heritage archival infrastructure as a whole. commit 9f843eef37313b551a158dfa11aea97e5ef2fc81 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 13 15:31:55 2021 +0100 Filter origins by visit type when scheduling the next visits We have separate task queues and workers for each visit type, so it makes sense to split this endpoint along these lines too, at least for now. commit 23d1b3c1883c3c955b5dd5ba1cc2270c93e156d6 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 13 15:25:56 2021 +0100 Reorganize ListedOrigin fixtures to generate multiple visit_types commit da347f7f4c401a43ec34de76365ad323d0ff7b77 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jan 12 17:10:39 2021 +0100 Introduce a `swh scheduler origin schedule-next` cli This creates one-shot tasks in the classic scheduler for the next visits to run according to the visit scheduling policy. commit 42957c9e96e6c7d8070e0b6c786c273e8c1602a0 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jan 12 17:28:33 2021 +0100 Rename test task types to names that match real tasks The success of tests using these task types would depend on the test run order, because these task types are (currently) being created by swh/scheduler/sql/50-data.sql, but the table is truncated after the first test completes. commit d1393c54da99c45175dd0b6a69734d17fc887960 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jan 12 16:16:31 2021 +0100 Introduce a `swh scheduler origin grab-next` cli This returns, as CSV, the next origins to be visited according to the passed scheduling policy.
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/115/ for more details.
Build is green
Patch application report for D4856 (id=17217)
Could not rebase; Attempt merge onto a62003397d...
Updating a620033..0cde030 Fast-forward .pre-commit-config.yaml | 1 + docs/index.rst | 1 + docs/simulator.rst | 55 +++++++++++++ mypy.ini | 3 + requirements-simulator.txt | 1 + setup.py | 5 +- sql/updates/20.sql | 6 ++ swh/scheduler/backend.py | 9 ++- swh/scheduler/cli/__init__.py | 7 +- swh/scheduler/cli/origin.py | 142 ++++++++++++++++++++++++++++++++ swh/scheduler/interface.py | 8 +- swh/scheduler/model.py | 9 +++ swh/scheduler/simulator/__init__.py | 144 +++++++++++++++++++++++++++++++++ swh/scheduler/simulator/__main__.py | 31 +++++++ swh/scheduler/sql/30-schema.sql | 2 +- swh/scheduler/sql/60-indexes.sql | 2 +- swh/scheduler/tests/common.py | 10 +-- swh/scheduler/tests/conftest.py | 45 ++++++++--- swh/scheduler/tests/test_cli_origin.py | 112 +++++++++++++++++++++++++ swh/scheduler/tests/test_model.py | 31 ++++++- swh/scheduler/tests/test_scheduler.py | 27 ++++--- 21 files changed, 612 insertions(+), 39 deletions(-) create mode 100644 docs/simulator.rst create mode 100644 requirements-simulator.txt create mode 100644 sql/updates/20.sql create mode 100644 swh/scheduler/cli/origin.py create mode 100644 swh/scheduler/simulator/__init__.py create mode 100644 swh/scheduler/simulator/__main__.py create mode 100644 swh/scheduler/tests/test_cli_origin.py
Changes applied before test
commit 0cde0300fbbd0832a8dcca52ea1e04597e75f423 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 13 16:13:01 2021 +0100 Introduce scaffolding for a scheduler simulator This simulator will allow us to compare the behavior of the old and new schedulers, as well as to test the impact of scheduler policies and their parameters on the performance of the Software Heritage archival infrastructure as a whole. commit ca45d40f2a62d4a0f200cabe760ad3a0cda00f89 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 13 15:31:55 2021 +0100 Filter origins by visit type when scheduling the next visits We have separate task queues and workers for each visit type, so it makes sense to split this endpoint along these lines too, at least for now. commit 59b4cb3f1c7a081e0d28b11d15888d38a9de151e Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 13 15:25:56 2021 +0100 Reorganize ListedOrigin fixtures to generate multiple visit_types commit 4f5338f2aba360fed2e524cbcdd23b11bacfb79d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jan 12 17:10:39 2021 +0100 Introduce a `swh scheduler origin schedule-next` cli This creates one-shot tasks in the classic scheduler for the next visits to run according to the visit scheduling policy. commit 3dd1d5f28d329620a65ee00749d24401b6d8cf00 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jan 12 17:28:33 2021 +0100 Rename test task types to names that match real tasks The success of tests using these task types would depend on the test run order, because these task types are (currently) being created by swh/scheduler/sql/50-data.sql, but the table is truncated after the first test completes. commit 5d7b002ac403565e348ac8fe4dd56d015cf29cae Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jan 12 16:16:31 2021 +0100 Introduce a `swh scheduler origin grab-next` cli This returns, as CSV, the next origins to be visited according to the passed scheduling policy.
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/121/ for more details.
Lots of iterative improvements:
- Introduce scaffolding for a scheduler simulator
- simulator: Use datetimes instead of a floating point simulated time
- simulator: Move scheduler into the simulation environment object
- simulator: generate OriginVisitStatus objects in modeled visits
- simulator: simulate the scheduler journal client
- simulator: Remove some debug statements and lower log level
- simulator: refine origin model to follow an exponential distribution
- simulator: add support for a basic SimulationReport
- simulator: add typing for Environment.scheduler
- simulator: generate more origins in fill_data
- simulator: tweak simulation environment constants
- simulator: Make the run time a CLI argument
- simulator: Split into smaller files in the same package
- simulator: also fill data for the task-based scheduler
- simulator: wrap tasks and task events in typechecked objects
- simulator: Replace attrs with dataclasses for consistency
- Move the simulator cli to the main cli module
- simulator: implement a simulator for the "old" task-based scheduler
Build is green
Patch application report for D4856 (id=17281)
Could not rebase; Attempt merge onto a5fb291703...
Updating a5fb291..a4bbd6b Fast-forward .pre-commit-config.yaml | 1 + docs/index.rst | 1 + docs/simulator.rst | 55 +++++++++++++ mypy.ini | 6 ++ requirements-simulator.txt | 2 + setup.py | 5 +- sql/updates/23.sql | 71 ++++++++++++++++ swh/scheduler/cli/__init__.py | 2 +- swh/scheduler/cli/simulator.py | 57 +++++++++++++ swh/scheduler/simulator/__init__.py | 123 ++++++++++++++++++++++++++++ swh/scheduler/simulator/common.py | 102 +++++++++++++++++++++++ swh/scheduler/simulator/origin_scheduler.py | 69 ++++++++++++++++ swh/scheduler/simulator/origins.py | 119 +++++++++++++++++++++++++++ swh/scheduler/simulator/task_scheduler.py | 77 +++++++++++++++++ swh/scheduler/sql/30-schema.sql | 2 +- swh/scheduler/sql/40-func.sql | 6 +- 16 files changed, 692 insertions(+), 6 deletions(-) create mode 100644 docs/simulator.rst create mode 100644 requirements-simulator.txt create mode 100644 sql/updates/23.sql create mode 100644 swh/scheduler/cli/simulator.py create mode 100644 swh/scheduler/simulator/__init__.py create mode 100644 swh/scheduler/simulator/common.py create mode 100644 swh/scheduler/simulator/origin_scheduler.py create mode 100644 swh/scheduler/simulator/origins.py create mode 100644 swh/scheduler/simulator/task_scheduler.py
Changes applied before test
commit a4bbd6bd914d5854be0830a034f855d05970b009 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:33:43 2021 +0100 simulator: implement a simulator for the "old" task-based scheduler We extend the Task object with an autogenerated uuid allowing us to track the task lifetime between its creation and the generation of visit statuses, as the task-based scheduler does. commit c9cf37ac6290783ec1f043833a887c7e76a0eb9d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:31:42 2021 +0100 Move the simulator cli to the main cli module commit b7e09ab024ac33ca4730d83f2a289b669dc784d2 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:37:59 2021 +0100 simulator: Replace attrs with dataclasses for consistency commit cc734124c942af9498c0cb11799613ac04d17047 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:31:41 2021 +0100 simulator: wrap tasks and task events in typechecked objects This allows us to extend these objects without redefining a bunch of type annotations. commit c9e915ca69a7e614979172a694149afa361ec88c Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 14:47:33 2021 +0100 simulator: also fill data for the task-based scheduler commit a5d0d0aa521819abf46f34ba265975ca1c806222 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Jan 15 14:41:05 2021 +0100 simulator: Split into smaller files in the same package commit 8c0c94afe407df82be55867a4550350772934aae Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:50:00 2021 +0100 simulator: Make the run time a CLI argument commit f49f8c488ef420890e0c94940fd08a8ccf7b5fe4 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:40:16 2021 +0100 simulator: tweak simulation environment constants commit 3e2f46120c7c220d347232d788f27cc7cfaaafd7 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:37:00 2021 +0100 simulator: generate more origins in fill_data commit be5375b59621189138847ada4d5c6ee71d82e554 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:35:01 2021 +0100 simulator: add typing for Environment.scheduler commit 24cd33ea564fd215c0eda55bfe479e3f1374feca Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:00:21 2021 +0100 simulator: add support for a basic SimulationReport For now, this collects the runtime of tasks that have run, and gets printed at the end of the simulation. commit 34ebf6af90537ec7864fd1b0d2bb5133a9db4f15 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:45:23 2021 +0100 simulator: refine origin model to follow an exponential distribution This models origins using a consistent characteristic "time between commits" that follows an exponential distribution between 1 second and 10 years. From this characteristic time, and feedback from the OriginVisitStats, we can generate the expected run time and output status of the next visit of that origin. commit efdd500aca9b886fa5031533cc159a9c469edf75 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:43:20 2021 +0100 simulator: Remove some debug statements and lower log level commit bff6576d4d1effd0f81380ffce74d9973f5e054f Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:17:11 2021 +0100 simulator: simulate the scheduler journal client commit 0e894915ad70fcc294b91c15e3f678f2f54c3f8a Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:12:38 2021 +0100 simulator: generate OriginVisitStatus objects in modeled visits To be able to generate uneventful visits, we would need to store the last snapshot seen for a given origin. Instead of storing this within the simulator, which would be a concern for large scale simulations, we use the scheduler visit cache directly. commit 315fb880e35361652a2277fcb7d5544e5ae81067 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:09:58 2021 +0100 simulator: Move scheduler into the simulation environment object The scheduler is used by a lot of the simulated actors, it makes sense to share it all the time. commit 6efb445060018326e5164b8f3bc6d137c6800fe5 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:07:56 2021 +0100 simulator: Use datetimes instead of a floating point simulated time commit 54d42dd92f2c40cd7fdeda136ab33e2c1423682f Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 13 16:13:01 2021 +0100 Introduce scaffolding for a scheduler simulator This simulator will allow us to compare the behavior of the old and new schedulers, as well as to test the impact of scheduler policies and their parameters on the performance of the Software Heritage archival infrastructure as a whole. commit d3afd144af1d3fa511cd2ae4cc76a25cc0856cc6 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:10:44 2021 +0100 Use the recorded task end time for the task scheduler feedback loop This allows us to run "time-warping" simulations without interference from the real wall clock time.
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/141/ for more details.
Build is green
Patch application report for D4856 (id=17306)
Rebasing onto d3afd144af...
Current branch diff-target is up to date.
Changes applied before test
commit 5a3c8d9bbea4f5ba62c61e98faa8d8d769f8a835 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Mon Jan 18 13:51:35 2021 +0100 simulator: add basic tests for fill_test_data and run commit a4bbd6bd914d5854be0830a034f855d05970b009 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:33:43 2021 +0100 simulator: implement a simulator for the "old" task-based scheduler We extend the Task object with an autogenerated uuid allowing us to track the task lifetime between its creation and the generation of visit statuses, as the task-based scheduler does. commit c9cf37ac6290783ec1f043833a887c7e76a0eb9d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:31:42 2021 +0100 Move the simulator cli to the main cli module commit b7e09ab024ac33ca4730d83f2a289b669dc784d2 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:37:59 2021 +0100 simulator: Replace attrs with dataclasses for consistency commit cc734124c942af9498c0cb11799613ac04d17047 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:31:41 2021 +0100 simulator: wrap tasks and task events in typechecked objects This allows us to extend these objects without redefining a bunch of type annotations. commit c9e915ca69a7e614979172a694149afa361ec88c Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 14:47:33 2021 +0100 simulator: also fill data for the task-based scheduler commit a5d0d0aa521819abf46f34ba265975ca1c806222 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Jan 15 14:41:05 2021 +0100 simulator: Split into smaller files in the same package commit 8c0c94afe407df82be55867a4550350772934aae Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:50:00 2021 +0100 simulator: Make the run time a CLI argument commit f49f8c488ef420890e0c94940fd08a8ccf7b5fe4 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:40:16 2021 +0100 simulator: tweak simulation environment constants commit 3e2f46120c7c220d347232d788f27cc7cfaaafd7 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:37:00 2021 +0100 simulator: generate more origins in fill_data commit be5375b59621189138847ada4d5c6ee71d82e554 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:35:01 2021 +0100 simulator: add typing for Environment.scheduler commit 24cd33ea564fd215c0eda55bfe479e3f1374feca Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:00:21 2021 +0100 simulator: add support for a basic SimulationReport For now, this collects the runtime of tasks that have run, and gets printed at the end of the simulation. commit 34ebf6af90537ec7864fd1b0d2bb5133a9db4f15 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:45:23 2021 +0100 simulator: refine origin model to follow an exponential distribution This models origins using a consistent characteristic "time between commits" that follows an exponential distribution between 1 second and 10 years. From this characteristic time, and feedback from the OriginVisitStats, we can generate the expected run time and output status of the next visit of that origin. commit efdd500aca9b886fa5031533cc159a9c469edf75 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:43:20 2021 +0100 simulator: Remove some debug statements and lower log level commit bff6576d4d1effd0f81380ffce74d9973f5e054f Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:17:11 2021 +0100 simulator: simulate the scheduler journal client commit 0e894915ad70fcc294b91c15e3f678f2f54c3f8a Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:12:38 2021 +0100 simulator: generate OriginVisitStatus objects in modeled visits To be able to generate uneventful visits, we would need to store the last snapshot seen for a given origin. Instead of storing this within the simulator, which would be a concern for large scale simulations, we use the scheduler visit cache directly. commit 315fb880e35361652a2277fcb7d5544e5ae81067 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:09:58 2021 +0100 simulator: Move scheduler into the simulation environment object The scheduler is used by a lot of the simulated actors, it makes sense to share it all the time. commit 6efb445060018326e5164b8f3bc6d137c6800fe5 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:07:56 2021 +0100 simulator: Use datetimes instead of a floating point simulated time commit 54d42dd92f2c40cd7fdeda136ab33e2c1423682f Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 13 16:13:01 2021 +0100 Introduce scaffolding for a scheduler simulator This simulator will allow us to compare the behavior of the old and new schedulers, as well as to test the impact of scheduler policies and their parameters on the performance of the Software Heritage archival infrastructure as a whole.
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/142/ for more details.
Build is green
Patch application report for D4856 (id=17336)
Rebasing onto 5e609d5205...
Current branch diff-target is up to date.
Changes applied before test
commit 77362633b7485bfe3944d8c278d509eb60f0d664 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Mon Jan 18 13:51:35 2021 +0100 simulator: add basic tests for fill_test_data and run commit eb7676ea2e8dcc5fa92067ad7858e5069ccc8db1 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:33:43 2021 +0100 simulator: implement a simulator for the "old" task-based scheduler We extend the Task object with an autogenerated uuid allowing us to track the task lifetime between its creation and the generation of visit statuses, as the task-based scheduler does. commit 687e6f007cb4943ef19ff87b87953607c6f206b7 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:31:42 2021 +0100 Move the simulator cli to the main cli module commit c3f520abc55c7355dbef0d2fed1102cc30040176 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:37:59 2021 +0100 simulator: Replace attrs with dataclasses for consistency commit 58042267faeaaab656c1e459b14fcfa24f300795 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:31:41 2021 +0100 simulator: wrap tasks and task events in typechecked objects This allows us to extend these objects without redefining a bunch of type annotations. commit b58eb740e9d407b95a1df632eaef91bbe6c3ff8b Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 14:47:33 2021 +0100 simulator: also fill data for the task-based scheduler commit 85df218106a0c29dd79900321572e87a7c90a5bd Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Jan 15 14:41:05 2021 +0100 simulator: Split into smaller files in the same package commit 2bc5187c76657b00d54b61f993aeeb2de25acf18 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:50:00 2021 +0100 simulator: Make the run time a CLI argument commit edf406dc961c8d0a77a34e73b0e19fcd511bd27d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:40:16 2021 +0100 simulator: tweak simulation environment constants commit e7d60a996249b6827332e17e2977bec1b69eab83 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:37:00 2021 +0100 simulator: generate more origins in fill_data commit b663e5414a09bc0b5a22c111894433d71c77f42c Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:35:01 2021 +0100 simulator: add typing for Environment.scheduler commit c3e8380e1aa140c8823ef76ba6d384474f160c9b Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:00:21 2021 +0100 simulator: add support for a basic SimulationReport For now, this collects the runtime of tasks that have run, and gets printed at the end of the simulation. commit b4b20ad406d8925cb4aa96828dbc5af14e0bda8d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:45:23 2021 +0100 simulator: refine origin model to follow an exponential distribution This models origins using a consistent characteristic "time between commits" that follows an exponential distribution between 1 second and 10 years. From this characteristic time, and feedback from the OriginVisitStats, we can generate the expected run time and output status of the next visit of that origin. commit cce6ce250ee0e73cc2b486c32cae8c05265a9974 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:43:20 2021 +0100 simulator: Remove some debug statements and lower log level commit 7e5f99837487c3785dfa96ed28ce9fecdf25bad8 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:17:11 2021 +0100 simulator: simulate the scheduler journal client commit 63c3beea168e2f41ff0cbd71fe53af95e062748a Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:12:38 2021 +0100 simulator: generate OriginVisitStatus objects in modeled visits To be able to generate uneventful visits, we would need to store the last snapshot seen for a given origin. Instead of storing this within the simulator, which would be a concern for large scale simulations, we use the scheduler visit cache directly. commit dd06b1bd428c15cf8ebb89873f24ee372ff363eb Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:09:58 2021 +0100 simulator: Move scheduler into the simulation environment object The scheduler is used by a lot of the simulated actors, it makes sense to share it all the time. commit 524ec4a50a60eb45815faf49d8d675a86756955b Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:07:56 2021 +0100 simulator: Use datetimes instead of a floating point simulated time commit 11263f58a02c9f1aa485df5ea4ac5131998f3d69 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 13 16:13:01 2021 +0100 Introduce scaffolding for a scheduler simulator This simulator will allow us to compare the behavior of the old and new schedulers, as well as to test the impact of scheduler policies and their parameters on the performance of the Software Heritage archival infrastructure as a whole.
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/155/ for more details.
docs/simulator.rst | ||
---|---|---|
17 | this list of items is not very clear. May be rephrased a bit for better clarity. Especially the last one on the feedback loop | |
swh/scheduler/cli/simulator.py | ||
38 | unclear what this "scheduler" option actually refers to. Is 'task_scheduler' the current ("legacy") one? And "origin_scheduler" the first simple implementation recently added? | |
45 | how does the "policy" option interact with the "scheduler" above? |
swh/scheduler/simulator/__init__.py | ||
---|---|---|
63 | ok now I see... |
swh/scheduler/simulator/__init__.py | ||
---|---|---|
21 | it would probably be nice to add a docstring/comment that gives an overall description of how this simulator works | |
swh/scheduler/simulator/origins.py | ||
37 | I'm not sure I get how this method is supposed to be called. Is it once and only once? or it it called each time an "next commit date for this origin" event is triggered (if that make sense)? I mean the method name suggest it gives a definitive mean time between commits. Is this it? | |
42 | this is not so easy to read and get (for someone like me at least)... I'd really appreciate a more comprehensive/explanatory comment here... |
overall looks good to me, but it could benefit from more comments and explanations. Not easy to get in as is.
swh/scheduler/simulator/task_scheduler.py | ||
---|---|---|
26 | why the 10 factor? |
swh/scheduler/simulator/task_scheduler.py | ||
---|---|---|
26 | it's completely arbitrary |
swh/scheduler/simulator/task_scheduler.py | ||
---|---|---|
26 | then add a comment about it |
Address @douardda's comments:
- simulator: Make min_batch_size a parameter defined in the setup.
- simulator: Add documentation.
swh/scheduler/simulator/task_scheduler.py | ||
---|---|---|
26 | better yet: I'll make it a constant somewhere else |
Build is green
Patch application report for D4856 (id=17348)
Rebasing onto 5e609d5205...
Current branch diff-target is up to date.
Changes applied before test
commit 7af4bebc7a4503964f9bd61ac101c54fc42ca474 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 16:32:27 2021 +0100 simulator: Add documentation. commit 1967379c3251f407e7e5128efbbceafe293e3704 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 16:17:24 2021 +0100 simulator: Make min_batch_size a parameter defined in the setup. commit 77362633b7485bfe3944d8c278d509eb60f0d664 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Mon Jan 18 13:51:35 2021 +0100 simulator: add basic tests for fill_test_data and run commit eb7676ea2e8dcc5fa92067ad7858e5069ccc8db1 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:33:43 2021 +0100 simulator: implement a simulator for the "old" task-based scheduler We extend the Task object with an autogenerated uuid allowing us to track the task lifetime between its creation and the generation of visit statuses, as the task-based scheduler does. commit 687e6f007cb4943ef19ff87b87953607c6f206b7 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:31:42 2021 +0100 Move the simulator cli to the main cli module commit c3f520abc55c7355dbef0d2fed1102cc30040176 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:37:59 2021 +0100 simulator: Replace attrs with dataclasses for consistency commit 58042267faeaaab656c1e459b14fcfa24f300795 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:31:41 2021 +0100 simulator: wrap tasks and task events in typechecked objects This allows us to extend these objects without redefining a bunch of type annotations. commit b58eb740e9d407b95a1df632eaef91bbe6c3ff8b Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 14:47:33 2021 +0100 simulator: also fill data for the task-based scheduler commit 85df218106a0c29dd79900321572e87a7c90a5bd Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Jan 15 14:41:05 2021 +0100 simulator: Split into smaller files in the same package commit 2bc5187c76657b00d54b61f993aeeb2de25acf18 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:50:00 2021 +0100 simulator: Make the run time a CLI argument commit edf406dc961c8d0a77a34e73b0e19fcd511bd27d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:40:16 2021 +0100 simulator: tweak simulation environment constants commit e7d60a996249b6827332e17e2977bec1b69eab83 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:37:00 2021 +0100 simulator: generate more origins in fill_data commit b663e5414a09bc0b5a22c111894433d71c77f42c Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:35:01 2021 +0100 simulator: add typing for Environment.scheduler commit c3e8380e1aa140c8823ef76ba6d384474f160c9b Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:00:21 2021 +0100 simulator: add support for a basic SimulationReport For now, this collects the runtime of tasks that have run, and gets printed at the end of the simulation. commit b4b20ad406d8925cb4aa96828dbc5af14e0bda8d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:45:23 2021 +0100 simulator: refine origin model to follow an exponential distribution This models origins using a consistent characteristic "time between commits" that follows an exponential distribution between 1 second and 10 years. From this characteristic time, and feedback from the OriginVisitStats, we can generate the expected run time and output status of the next visit of that origin. commit cce6ce250ee0e73cc2b486c32cae8c05265a9974 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:43:20 2021 +0100 simulator: Remove some debug statements and lower log level commit 7e5f99837487c3785dfa96ed28ce9fecdf25bad8 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:17:11 2021 +0100 simulator: simulate the scheduler journal client commit 63c3beea168e2f41ff0cbd71fe53af95e062748a Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:12:38 2021 +0100 simulator: generate OriginVisitStatus objects in modeled visits To be able to generate uneventful visits, we would need to store the last snapshot seen for a given origin. Instead of storing this within the simulator, which would be a concern for large scale simulations, we use the scheduler visit cache directly. commit dd06b1bd428c15cf8ebb89873f24ee372ff363eb Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:09:58 2021 +0100 simulator: Move scheduler into the simulation environment object The scheduler is used by a lot of the simulated actors, it makes sense to share it all the time. commit 524ec4a50a60eb45815faf49d8d675a86756955b Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:07:56 2021 +0100 simulator: Use datetimes instead of a floating point simulated time commit 11263f58a02c9f1aa485df5ea4ac5131998f3d69 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 13 16:13:01 2021 +0100 Introduce scaffolding for a scheduler simulator This simulator will allow us to compare the behavior of the old and new schedulers, as well as to test the impact of scheduler policies and their parameters on the performance of the Software Heritage archival infrastructure as a whole.
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/161/ for more details.
Build is green
Patch application report for D4856 (id=17350)
Rebasing onto 5e609d5205...
Current branch diff-target is up to date.
Changes applied before test
commit b594c847826699608f49afe4153c2f2b3ef99657 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 16:32:27 2021 +0100 simulator: Add documentation. commit 1967379c3251f407e7e5128efbbceafe293e3704 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 16:17:24 2021 +0100 simulator: Make min_batch_size a parameter defined in the setup. commit 77362633b7485bfe3944d8c278d509eb60f0d664 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Mon Jan 18 13:51:35 2021 +0100 simulator: add basic tests for fill_test_data and run commit eb7676ea2e8dcc5fa92067ad7858e5069ccc8db1 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:33:43 2021 +0100 simulator: implement a simulator for the "old" task-based scheduler We extend the Task object with an autogenerated uuid allowing us to track the task lifetime between its creation and the generation of visit statuses, as the task-based scheduler does. commit 687e6f007cb4943ef19ff87b87953607c6f206b7 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:31:42 2021 +0100 Move the simulator cli to the main cli module commit c3f520abc55c7355dbef0d2fed1102cc30040176 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:37:59 2021 +0100 simulator: Replace attrs with dataclasses for consistency commit 58042267faeaaab656c1e459b14fcfa24f300795 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:31:41 2021 +0100 simulator: wrap tasks and task events in typechecked objects This allows us to extend these objects without redefining a bunch of type annotations. commit b58eb740e9d407b95a1df632eaef91bbe6c3ff8b Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 14:47:33 2021 +0100 simulator: also fill data for the task-based scheduler commit 85df218106a0c29dd79900321572e87a7c90a5bd Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Jan 15 14:41:05 2021 +0100 simulator: Split into smaller files in the same package commit 2bc5187c76657b00d54b61f993aeeb2de25acf18 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:50:00 2021 +0100 simulator: Make the run time a CLI argument commit edf406dc961c8d0a77a34e73b0e19fcd511bd27d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:40:16 2021 +0100 simulator: tweak simulation environment constants commit e7d60a996249b6827332e17e2977bec1b69eab83 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:37:00 2021 +0100 simulator: generate more origins in fill_data commit b663e5414a09bc0b5a22c111894433d71c77f42c Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:35:01 2021 +0100 simulator: add typing for Environment.scheduler commit c3e8380e1aa140c8823ef76ba6d384474f160c9b Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:00:21 2021 +0100 simulator: add support for a basic SimulationReport For now, this collects the runtime of tasks that have run, and gets printed at the end of the simulation. commit b4b20ad406d8925cb4aa96828dbc5af14e0bda8d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:45:23 2021 +0100 simulator: refine origin model to follow an exponential distribution This models origins using a consistent characteristic "time between commits" that follows an exponential distribution between 1 second and 10 years. From this characteristic time, and feedback from the OriginVisitStats, we can generate the expected run time and output status of the next visit of that origin. commit cce6ce250ee0e73cc2b486c32cae8c05265a9974 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:43:20 2021 +0100 simulator: Remove some debug statements and lower log level commit 7e5f99837487c3785dfa96ed28ce9fecdf25bad8 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:17:11 2021 +0100 simulator: simulate the scheduler journal client commit 63c3beea168e2f41ff0cbd71fe53af95e062748a Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:12:38 2021 +0100 simulator: generate OriginVisitStatus objects in modeled visits To be able to generate uneventful visits, we would need to store the last snapshot seen for a given origin. Instead of storing this within the simulator, which would be a concern for large scale simulations, we use the scheduler visit cache directly. commit dd06b1bd428c15cf8ebb89873f24ee372ff363eb Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:09:58 2021 +0100 simulator: Move scheduler into the simulation environment object The scheduler is used by a lot of the simulated actors, it makes sense to share it all the time. commit 524ec4a50a60eb45815faf49d8d675a86756955b Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:07:56 2021 +0100 simulator: Use datetimes instead of a floating point simulated time commit 11263f58a02c9f1aa485df5ea4ac5131998f3d69 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 13 16:13:01 2021 +0100 Introduce scaffolding for a scheduler simulator This simulator will allow us to compare the behavior of the old and new schedulers, as well as to test the impact of scheduler policies and their parameters on the performance of the Software Heritage archival infrastructure as a whole.
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/162/ for more details.
Build is green
Patch application report for D4856 (id=17356)
Could not rebase; Attempt merge onto 0a32a31195...
Auto-merging setup.py Merge made by the 'recursive' strategy. .pre-commit-config.yaml | 1 + docs/index.rst | 1 + docs/simulator.rst | 65 +++++++++++++ mypy.ini | 6 ++ requirements-simulator.txt | 2 + setup.py | 34 +++---- swh/scheduler/cli/__init__.py | 2 +- swh/scheduler/cli/simulator.py | 61 ++++++++++++ swh/scheduler/simulator/__init__.py | 144 ++++++++++++++++++++++++++++ swh/scheduler/simulator/common.py | 102 ++++++++++++++++++++ swh/scheduler/simulator/origin_scheduler.py | 68 +++++++++++++ swh/scheduler/simulator/origins.py | 128 +++++++++++++++++++++++++ swh/scheduler/simulator/task_scheduler.py | 76 +++++++++++++++ swh/scheduler/tests/test_simulator.py | 45 +++++++++ 14 files changed, 718 insertions(+), 17 deletions(-) create mode 100644 docs/simulator.rst create mode 100644 requirements-simulator.txt create mode 100644 swh/scheduler/cli/simulator.py create mode 100644 swh/scheduler/simulator/__init__.py create mode 100644 swh/scheduler/simulator/common.py create mode 100644 swh/scheduler/simulator/origin_scheduler.py create mode 100644 swh/scheduler/simulator/origins.py create mode 100644 swh/scheduler/simulator/task_scheduler.py create mode 100644 swh/scheduler/tests/test_simulator.py
Changes applied before test
commit 69877d3f987eaba00dcc97359b48e1fc8a677298 Merge: 0a32a31 ed04441 Author: Jenkins user <jenkins@localhost> Date: Tue Jan 19 17:06:09 2021 +0000 Merge branch 'diff-target' into HEAD commit ed044415e625080cb4bc67b2656743d92ed4c884 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 16:32:27 2021 +0100 simulator: Add documentation. commit 186aebeb12905dc98cc370b360a8b3f5c4db3186 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 16:17:24 2021 +0100 simulator: Make min_batch_size a parameter defined in the setup. commit 6150c764616d3c25ee13eb08bea6b4c9d1c2bc0d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Mon Jan 18 13:51:35 2021 +0100 simulator: add basic tests for fill_test_data and run commit 5bee207dca74bb2c70611b3308c93bc522d48247 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:33:43 2021 +0100 simulator: implement a simulator for the "old" task-based scheduler We extend the Task object with an autogenerated uuid allowing us to track the task lifetime between its creation and the generation of visit statuses, as the task-based scheduler does. commit 6ec79c18b7b0e8be7b086aa79e87de81a8dbd06a Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:31:42 2021 +0100 Move the simulator cli to the main cli module commit b25874a7066c95460f7d24c132f32f4dabf055a7 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:37:59 2021 +0100 simulator: Replace attrs with dataclasses for consistency commit cb0bc27be55cf384c68b834ae3c89dd93434fbba Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:31:41 2021 +0100 simulator: wrap tasks and task events in typechecked objects This allows us to extend these objects without redefining a bunch of type annotations. commit 947aecb14cdb4c6dd2da178f53599b4a41c8245b Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 14:47:33 2021 +0100 simulator: also fill data for the task-based scheduler commit ff6cd0669e0d75afbd2c63424db66bf8d1e91bee Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Jan 15 14:41:05 2021 +0100 simulator: Split into smaller files in the same package commit b232135cb982f4fc8e5fb6242a88012d732e252d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:50:00 2021 +0100 simulator: Make the run time a CLI argument commit 24e93d8aa72107bf953f884df4c9b15ea9cbeb2c Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:40:16 2021 +0100 simulator: tweak simulation environment constants commit 9885d12cd708a26878cd9aa70ab590223589e8d7 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:37:00 2021 +0100 simulator: generate more origins in fill_data commit a1d80fec0f5760d136857fb893232b1baec35b64 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:35:01 2021 +0100 simulator: add typing for Environment.scheduler commit 6a9ec5f38133fe232da1ca98ff30ef44b12a4c12 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:00:21 2021 +0100 simulator: add support for a basic SimulationReport For now, this collects the runtime of tasks that have run, and gets printed at the end of the simulation. commit 5d0e2aee4182df9476934349ad20da5dafc8b61f Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:45:23 2021 +0100 simulator: refine origin model to follow an exponential distribution This models origins using a consistent characteristic "time between commits" that follows an exponential distribution between 1 second and 10 years. From this characteristic time, and feedback from the OriginVisitStats, we can generate the expected run time and output status of the next visit of that origin. commit 7934c2f90191615db69b50dc27744ec73704f896 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:43:20 2021 +0100 simulator: Remove some debug statements and lower log level commit d0ed751eca9f2ff0464b795edb9e9bb2a0305649 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:17:11 2021 +0100 simulator: simulate the scheduler journal client commit c9b0728955e683748f8b03a22f91d501b64aad67 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:12:38 2021 +0100 simulator: generate OriginVisitStatus objects in modeled visits To be able to generate uneventful visits, we would need to store the last snapshot seen for a given origin. Instead of storing this within the simulator, which would be a concern for large scale simulations, we use the scheduler visit cache directly. commit 56c8d1dd66d8a993c8bc7c7bcc4e3fb3704f6864 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:09:58 2021 +0100 simulator: Move scheduler into the simulation environment object The scheduler is used by a lot of the simulated actors, it makes sense to share it all the time. commit 1659aa17fe0510030fb24d3b7867d2c4a366b5dd Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:07:56 2021 +0100 simulator: Use datetimes instead of a floating point simulated time commit ef241dd84c400f9be0d92396867587d47216e385 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 13 16:13:01 2021 +0100 Introduce scaffolding for a scheduler simulator This simulator will allow us to compare the behavior of the old and new schedulers, as well as to test the impact of scheduler policies and their parameters on the performance of the Software Heritage archival infrastructure as a whole. commit 49a14792b0329049b51cbc6ed9c48006e9ff1a73 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jan 19 17:56:44 2021 +0100 Import the journal subcommand in the main swh.scheduler cli This issue was masked by tox.ini using pytest with --doctest-modules, which imports all modules during test collection, and therefore executing the side-effects of swh.scheduler.cli.journal.
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/165/ for more details.
Build is green
Patch application report for D4856 (id=17362)
Could not rebase; Attempt merge onto 0a32a31195...
Auto-merging setup.py Merge made by the 'recursive' strategy. .pre-commit-config.yaml | 1 + docs/index.rst | 1 + docs/simulator.rst | 65 ++++++++++++ mypy.ini | 6 ++ requirements-simulator.txt | 2 + setup.py | 34 ++++--- swh/scheduler/cli/__init__.py | 2 +- swh/scheduler/cli/simulator.py | 68 +++++++++++++ swh/scheduler/simulator/__init__.py | 147 ++++++++++++++++++++++++++++ swh/scheduler/simulator/common.py | 102 +++++++++++++++++++ swh/scheduler/simulator/origin_scheduler.py | 68 +++++++++++++ swh/scheduler/simulator/origins.py | 128 ++++++++++++++++++++++++ swh/scheduler/simulator/task_scheduler.py | 76 ++++++++++++++ swh/scheduler/tests/test_simulator.py | 53 ++++++++++ 14 files changed, 736 insertions(+), 17 deletions(-) create mode 100644 docs/simulator.rst create mode 100644 requirements-simulator.txt create mode 100644 swh/scheduler/cli/simulator.py create mode 100644 swh/scheduler/simulator/__init__.py create mode 100644 swh/scheduler/simulator/common.py create mode 100644 swh/scheduler/simulator/origin_scheduler.py create mode 100644 swh/scheduler/simulator/origins.py create mode 100644 swh/scheduler/simulator/task_scheduler.py create mode 100644 swh/scheduler/tests/test_simulator.py
Changes applied before test
commit c4641ac0f67e92b7d6ffe885f9fc7a410a547e63 Merge: 0a32a31 e12a4f1 Author: Jenkins user <jenkins@localhost> Date: Tue Jan 19 17:42:37 2021 +0000 Merge branch 'diff-target' into HEAD commit e12a4f13386cdb25d366f5e2ee81044cb8e30169 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 18:36:53 2021 +0100 simulator: stop using get_scheduler directly This reuses the scheduler instantiated by the cli instead of hardcoding our own using the PG* variables. commit ed044415e625080cb4bc67b2656743d92ed4c884 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 16:32:27 2021 +0100 simulator: Add documentation. commit 186aebeb12905dc98cc370b360a8b3f5c4db3186 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 16:17:24 2021 +0100 simulator: Make min_batch_size a parameter defined in the setup. commit 6150c764616d3c25ee13eb08bea6b4c9d1c2bc0d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Mon Jan 18 13:51:35 2021 +0100 simulator: add basic tests for fill_test_data and run commit 5bee207dca74bb2c70611b3308c93bc522d48247 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:33:43 2021 +0100 simulator: implement a simulator for the "old" task-based scheduler We extend the Task object with an autogenerated uuid allowing us to track the task lifetime between its creation and the generation of visit statuses, as the task-based scheduler does. commit 6ec79c18b7b0e8be7b086aa79e87de81a8dbd06a Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:31:42 2021 +0100 Move the simulator cli to the main cli module commit b25874a7066c95460f7d24c132f32f4dabf055a7 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:37:59 2021 +0100 simulator: Replace attrs with dataclasses for consistency commit cb0bc27be55cf384c68b834ae3c89dd93434fbba Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:31:41 2021 +0100 simulator: wrap tasks and task events in typechecked objects This allows us to extend these objects without redefining a bunch of type annotations. commit 947aecb14cdb4c6dd2da178f53599b4a41c8245b Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 14:47:33 2021 +0100 simulator: also fill data for the task-based scheduler commit ff6cd0669e0d75afbd2c63424db66bf8d1e91bee Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Jan 15 14:41:05 2021 +0100 simulator: Split into smaller files in the same package commit b232135cb982f4fc8e5fb6242a88012d732e252d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:50:00 2021 +0100 simulator: Make the run time a CLI argument commit 24e93d8aa72107bf953f884df4c9b15ea9cbeb2c Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:40:16 2021 +0100 simulator: tweak simulation environment constants commit 9885d12cd708a26878cd9aa70ab590223589e8d7 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:37:00 2021 +0100 simulator: generate more origins in fill_data commit a1d80fec0f5760d136857fb893232b1baec35b64 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:35:01 2021 +0100 simulator: add typing for Environment.scheduler commit 6a9ec5f38133fe232da1ca98ff30ef44b12a4c12 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:00:21 2021 +0100 simulator: add support for a basic SimulationReport For now, this collects the runtime of tasks that have run, and gets printed at the end of the simulation. commit 5d0e2aee4182df9476934349ad20da5dafc8b61f Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:45:23 2021 +0100 simulator: refine origin model to follow an exponential distribution This models origins using a consistent characteristic "time between commits" that follows an exponential distribution between 1 second and 10 years. From this characteristic time, and feedback from the OriginVisitStats, we can generate the expected run time and output status of the next visit of that origin. commit 7934c2f90191615db69b50dc27744ec73704f896 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:43:20 2021 +0100 simulator: Remove some debug statements and lower log level commit d0ed751eca9f2ff0464b795edb9e9bb2a0305649 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:17:11 2021 +0100 simulator: simulate the scheduler journal client commit c9b0728955e683748f8b03a22f91d501b64aad67 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:12:38 2021 +0100 simulator: generate OriginVisitStatus objects in modeled visits To be able to generate uneventful visits, we would need to store the last snapshot seen for a given origin. Instead of storing this within the simulator, which would be a concern for large scale simulations, we use the scheduler visit cache directly. commit 56c8d1dd66d8a993c8bc7c7bcc4e3fb3704f6864 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:09:58 2021 +0100 simulator: Move scheduler into the simulation environment object The scheduler is used by a lot of the simulated actors, it makes sense to share it all the time. commit 1659aa17fe0510030fb24d3b7867d2c4a366b5dd Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:07:56 2021 +0100 simulator: Use datetimes instead of a floating point simulated time commit ef241dd84c400f9be0d92396867587d47216e385 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 13 16:13:01 2021 +0100 Introduce scaffolding for a scheduler simulator This simulator will allow us to compare the behavior of the old and new schedulers, as well as to test the impact of scheduler policies and their parameters on the performance of the Software Heritage archival infrastructure as a whole. commit 49a14792b0329049b51cbc6ed9c48006e9ff1a73 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jan 19 17:56:44 2021 +0100 Import the journal subcommand in the main swh.scheduler cli This issue was masked by tox.ini using pytest with --doctest-modules, which imports all modules during test collection, and therefore executing the side-effects of swh.scheduler.cli.journal.
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/171/ for more details.
Build is green
Patch application report for D4856 (id=17381)
Could not rebase; Attempt merge onto 98526539a8...
Updating 9852653..89b8839 Fast-forward .pre-commit-config.yaml | 1 + docs/index.rst | 1 + docs/simulator.rst | 65 ++++++++++ mypy.ini | 6 + requirements-simulator.txt | 2 + setup.py | 34 +++--- sql/updates/25.sql | 64 ++++++++++ swh/scheduler/backend.py | 87 +++++++++++++ swh/scheduler/cli/__init__.py | 2 +- swh/scheduler/cli/origin.py | 40 ++++++ swh/scheduler/cli/simulator.py | 68 +++++++++++ swh/scheduler/interface.py | 40 ++++++ swh/scheduler/model.py | 32 +++++ swh/scheduler/simulator/__init__.py | 147 ++++++++++++++++++++++ swh/scheduler/simulator/common.py | 102 ++++++++++++++++ swh/scheduler/simulator/origin_scheduler.py | 68 +++++++++++ swh/scheduler/simulator/origins.py | 128 ++++++++++++++++++++ swh/scheduler/simulator/task_scheduler.py | 76 ++++++++++++ swh/scheduler/sql/30-schema.sql | 24 +++- swh/scheduler/sql/40-func.sql | 40 ++++++ swh/scheduler/tests/test_api_client.py | 3 + swh/scheduler/tests/test_cli_origin.py | 11 ++ swh/scheduler/tests/test_scheduler.py | 181 +++++++++++++++++++++++++++- swh/scheduler/tests/test_simulator.py | 53 ++++++++ 24 files changed, 1255 insertions(+), 20 deletions(-) create mode 100644 docs/simulator.rst create mode 100644 requirements-simulator.txt create mode 100644 sql/updates/25.sql create mode 100644 swh/scheduler/cli/simulator.py create mode 100644 swh/scheduler/simulator/__init__.py create mode 100644 swh/scheduler/simulator/common.py create mode 100644 swh/scheduler/simulator/origin_scheduler.py create mode 100644 swh/scheduler/simulator/origins.py create mode 100644 swh/scheduler/simulator/task_scheduler.py create mode 100644 swh/scheduler/tests/test_simulator.py
Changes applied before test
commit 89b8839ce5d1e5db2bb9b69c96dbc943d1172ff0 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 18:36:53 2021 +0100 simulator: stop using get_scheduler directly This reuses the scheduler instantiated by the cli instead of hardcoding our own using the PG* variables. commit f9f28ece9b78957a7dac050c9d21fe0b0c64ad95 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 16:32:27 2021 +0100 simulator: Add documentation. commit 1b335be22b7ad25eede2ac605f86d2fd80a61b4d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 16:17:24 2021 +0100 simulator: Make min_batch_size a parameter defined in the setup. commit 403e97c5599934aed746f9301845c9e6f0d7d933 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Mon Jan 18 13:51:35 2021 +0100 simulator: add basic tests for fill_test_data and run commit cb9b2c1ddb6cf641b3b23fedf7a36269cc4ced6d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:33:43 2021 +0100 simulator: implement a simulator for the "old" task-based scheduler We extend the Task object with an autogenerated uuid allowing us to track the task lifetime between its creation and the generation of visit statuses, as the task-based scheduler does. commit 44407d7fd413c62070d85f6ee1de2268a87e2906 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:31:42 2021 +0100 Move the simulator cli to the main cli module commit f951490ccfbdf30c4ef57d0b41651f6f43278873 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:37:59 2021 +0100 simulator: Replace attrs with dataclasses for consistency commit b9b3defd03f87febb5c06c50ac2b7c9d37e918d5 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:31:41 2021 +0100 simulator: wrap tasks and task events in typechecked objects This allows us to extend these objects without redefining a bunch of type annotations. commit b4b83f6e15476f93c51d68adbbfdbbb10d71d444 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 14:47:33 2021 +0100 simulator: also fill data for the task-based scheduler commit 029c95f8887cac6d0eeabb4516812371375dbd28 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Jan 15 14:41:05 2021 +0100 simulator: Split into smaller files in the same package commit 131994c324502f455080603fb8ebda0e77feba22 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:50:00 2021 +0100 simulator: Make the run time a CLI argument commit 4e54c277a3a3faa3399b615b096bcb7149a5ff78 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:40:16 2021 +0100 simulator: tweak simulation environment constants commit 631955aaccdb8a6f2cbdc2881ce70553c1d437e0 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:37:00 2021 +0100 simulator: generate more origins in fill_data commit f8bdbec28238cdf9c487ae7ed1cc24cbfbdffdb3 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:35:01 2021 +0100 simulator: add typing for Environment.scheduler commit b97bb855576e1edf23be70993b6df54dc0f16a6f Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:00:21 2021 +0100 simulator: add support for a basic SimulationReport For now, this collects the runtime of tasks that have run, and gets printed at the end of the simulation. commit d50da6b64b1242d226dafbfc032184c8e5fb1c9f Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:45:23 2021 +0100 simulator: refine origin model to follow an exponential distribution This models origins using a consistent characteristic "time between commits" that follows an exponential distribution between 1 second and 10 years. From this characteristic time, and feedback from the OriginVisitStats, we can generate the expected run time and output status of the next visit of that origin. commit fd44eb75447aba9a03b43621b88f140d8dc15ec1 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:43:20 2021 +0100 simulator: Remove some debug statements and lower log level commit 393313a7b5530a3f123e9ca7e92fe9d61038d829 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:17:11 2021 +0100 simulator: simulate the scheduler journal client commit 1f1e6c5d5157ee8f30b8c56a1cf130ac5ef4e953 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:12:38 2021 +0100 simulator: generate OriginVisitStatus objects in modeled visits To be able to generate uneventful visits, we would need to store the last snapshot seen for a given origin. Instead of storing this within the simulator, which would be a concern for large scale simulations, we use the scheduler visit cache directly. commit 62870d9d11e3f598130f2562181dc8a59b7e2e2d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:09:58 2021 +0100 simulator: Move scheduler into the simulation environment object The scheduler is used by a lot of the simulated actors, it makes sense to share it all the time. commit 89c76bd7e776f5dadf8b3ff13b9bd5d5cc42f208 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:07:56 2021 +0100 simulator: Use datetimes instead of a floating point simulated time commit ccf03c4e1f9bd3b1e46a1de0bfc7c7e4b055284d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 13 16:13:01 2021 +0100 Introduce scaffolding for a scheduler simulator This simulator will allow us to compare the behavior of the old and new schedulers, as well as to test the impact of scheduler policies and their parameters on the performance of the Software Heritage archival infrastructure as a whole. commit 53b034cb8d09efa0c9b448d29fb70d727bc6a066 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 18:39:21 2021 +0100 Add a cli for the scheduler metrics update endpoint commit 737d12e5b9e694b22bef291c625090fb3aee2afc Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jan 19 17:48:31 2021 +0100 Introduce a new lister_get endpoint commit 114ed952e513c7ad3dbb038a640e80bf079d0780 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jan 19 14:23:32 2021 +0100 Implement some basic aggregated metrics on listed origins Metrics are computed and cached database-side by the `update_metrics` function. The `get_metrics` function only retrieves the cached data. The metrics are aggregated for each lister instance and visit type (allowing complete reaggregation by visit type for cross-cutting statistics). The following metrics have been implemented: - number of known origins overall - number of enabled origins (origins seen in the last listing) - number of enabled origins that have never been successfully visited - number of enabled origins with known activity since our last successful visit
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/181/ for more details.
Build is green
Patch application report for D4856 (id=17407)
Rebasing onto 7905a6bea4...
Current branch diff-target is up to date.
Changes applied before test
commit 898820fac52cf6fcfb5d2770aad49f131370a5a6 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 20 12:11:05 2021 +0100 simulator: collect and plot scheduler metrics over time For now, only plot the known_origins and origins_never_visited metrics. commit 9ce68f8d0e0ea69bd6672a50687079b5b1ea460c Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 18:36:53 2021 +0100 simulator: stop using get_scheduler directly This reuses the scheduler instantiated by the cli instead of hardcoding our own using the PG* variables. commit 88e0b42805011bc3886f77ce5c91b3450351a16f Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 16:32:27 2021 +0100 simulator: Add documentation. commit 62c6d90867bccb17ae076e1b5ee4db6fd350ad1b Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Jan 19 16:17:24 2021 +0100 simulator: Make min_batch_size a parameter defined in the setup. commit 9468bb9384f14e5fa0548b7d985f66fb3e36c85a Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Mon Jan 18 13:51:35 2021 +0100 simulator: add basic tests for fill_test_data and run commit ead7b347db9d8852b4c347729d7e6d32b72d9058 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:33:43 2021 +0100 simulator: implement a simulator for the "old" task-based scheduler We extend the Task object with an autogenerated uuid allowing us to track the task lifetime between its creation and the generation of visit statuses, as the task-based scheduler does. commit aecd27eee06aaa46d350e9d5b3f86ccc36a5446c Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 16:31:42 2021 +0100 Move the simulator cli to the main cli module commit 05067e3ecc888271507505112b48ebc9f755f5e7 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:37:59 2021 +0100 simulator: Replace attrs with dataclasses for consistency commit 24922fe2d995ca3ffa6c3c5a19c1f5f5531db4c8 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 15:31:41 2021 +0100 simulator: wrap tasks and task events in typechecked objects This allows us to extend these objects without redefining a bunch of type annotations. commit d5318aea0a93a94c80f8d743ce1de63592161f5a Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 14:47:33 2021 +0100 simulator: also fill data for the task-based scheduler commit 22ebb7a9a4bc6639e6f52d71c2b727537baf5019 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Jan 15 14:41:05 2021 +0100 simulator: Split into smaller files in the same package commit ad7bfbe731da64cc6d1ddaa3f5ae1ef1e3350f60 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:50:00 2021 +0100 simulator: Make the run time a CLI argument commit df34db0bfc61df418f00338345b4b46a86340f62 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:40:16 2021 +0100 simulator: tweak simulation environment constants commit 21ce2c88dddce081bfd525d08454ca09bbf521c6 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:37:00 2021 +0100 simulator: generate more origins in fill_data commit 29204199774b40bea4d3d23ffe9407a5d090f8fa Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:35:01 2021 +0100 simulator: add typing for Environment.scheduler commit 6433266106dda007d1e5304a0dcb01706c8acb42 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 12:00:21 2021 +0100 simulator: add support for a basic SimulationReport For now, this collects the runtime of tasks that have run, and gets printed at the end of the simulation. commit c474a825336a4e4132e83982e180451b02d8f54d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:45:23 2021 +0100 simulator: refine origin model to follow an exponential distribution This models origins using a consistent characteristic "time between commits" that follows an exponential distribution between 1 second and 10 years. From this characteristic time, and feedback from the OriginVisitStats, we can generate the expected run time and output status of the next visit of that origin. commit 2459badf0c05bf2cb663e66b9deabf1150638bb1 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Jan 15 11:43:20 2021 +0100 simulator: Remove some debug statements and lower log level commit cb12449e8f57e59ec4c7953a3c4a52c9193d202e Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:17:11 2021 +0100 simulator: simulate the scheduler journal client commit 20b7f9c68f831839f4be1cae4b9ae2dce0fc2d96 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:12:38 2021 +0100 simulator: generate OriginVisitStatus objects in modeled visits To be able to generate uneventful visits, we would need to store the last snapshot seen for a given origin. Instead of storing this within the simulator, which would be a concern for large scale simulations, we use the scheduler visit cache directly. commit 39ad47de2e753033c4b7114a64b5c3144b6ea821 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:09:58 2021 +0100 simulator: Move scheduler into the simulation environment object The scheduler is used by a lot of the simulated actors, it makes sense to share it all the time. commit 31967fa850c3afe29fc37e41cfcd53ff5408e7b9 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 14 15:07:56 2021 +0100 simulator: Use datetimes instead of a floating point simulated time commit fc3f06bd1d77c76bfba4c05efcd62abcb5c46eea Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 13 16:13:01 2021 +0100 Introduce scaffolding for a scheduler simulator This simulator will allow us to compare the behavior of the old and new schedulers, as well as to test the impact of scheduler policies and their parameters on the performance of the Software Heritage archival infrastructure as a whole.
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/194/ for more details.