Page MenuHomeSoftware Heritage

Introduce scaffolding for a scheduler simulator
ClosedPublic

Authored by vlorentz on Wed, Jan 13, 4:21 PM.

Details

Reviewers
olasd
douardda
Group Reviewers
Reviewers
Maniphest Tasks
T2973: Implement a scheduler simulator
Commits
rDSCH9468bb9384f1: simulator: add basic tests for fill_test_data and run
rDSCH88e0b4280501: simulator: Add documentation.
rDSCH898820fac52c: simulator: collect and plot scheduler metrics over time
rDSCH9ce68f8d0e0e: simulator: stop using get_scheduler directly
rDSCH62c6d90867bc: simulator: Make min_batch_size a parameter defined in the setup.
rDSCHead7b347db9d: simulator: implement a simulator for the "old" task-based scheduler
rDSCHaecd27eee06a: Move the simulator cli to the main cli module
rDSCH05067e3ecc88: simulator: Replace attrs with dataclasses for consistency
rDSCH24922fe2d995: simulator: wrap tasks and task events in typechecked objects
rDSCH22ebb7a9a4bc: simulator: Split into smaller files in the same package
rDSCHd5318aea0a93: simulator: also fill data for the task-based scheduler
rDSCH29204199774b: simulator: add typing for Environment.scheduler
rDSCHad7bfbe731da: simulator: Make the run time a CLI argument
rDSCHdf34db0bfc61: simulator: tweak simulation environment constants
rDSCH21ce2c88dddc: simulator: generate more origins in fill_data
rDSCH6433266106dd: simulator: add support for a basic SimulationReport
rDSCHc474a825336a: simulator: refine origin model to follow an exponential distribution
rDSCHcb12449e8f57: simulator: simulate the scheduler journal client
rDSCH20b7f9c68f83: simulator: generate OriginVisitStatus objects in modeled visits
rDSCH2459badf0c05: simulator: Remove some debug statements and lower log level
rDSCH39ad47de2e75: simulator: Move scheduler into the simulation environment object
rDSCH31967fa850c3: simulator: Use datetimes instead of a floating point simulated time
rDSCHfc3f06bd1d77: Introduce scaffolding for a scheduler simulator
Summary

This simulator will allow us to compare the behavior of the old and new
schedulers, as well as to test the impact of scheduler policies and their
parameters on the performance of the Software Heritage archival
infrastructure as a whole.

Test Plan

use the docs, Luke

Diff Detail

Repository
rDSCH Scheduling utilities
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Build is green

Patch application report for D4856 (id=17208)

Could not rebase; Attempt merge onto a62003397d...

Updating a620033..dfa0aee
Fast-forward
 .pre-commit-config.yaml                |   1 +
 docs/index.rst                         |   1 +
 docs/simulator.rst                     |  55 +++++++++++++
 mypy.ini                               |   3 +
 requirements-simulator.txt             |   1 +
 setup.py                               |   5 +-
 sql/updates/20.sql                     |   6 ++
 swh/scheduler/backend.py               |   9 ++-
 swh/scheduler/cli/__init__.py          |   7 +-
 swh/scheduler/cli/origin.py            | 141 ++++++++++++++++++++++++++++++++
 swh/scheduler/interface.py             |   8 +-
 swh/scheduler/model.py                 |   9 +++
 swh/scheduler/simulator/__init__.py    | 144 +++++++++++++++++++++++++++++++++
 swh/scheduler/simulator/__main__.py    |  31 +++++++
 swh/scheduler/sql/30-schema.sql        |   2 +-
 swh/scheduler/sql/60-indexes.sql       |   2 +-
 swh/scheduler/tests/common.py          |  10 +--
 swh/scheduler/tests/conftest.py        |  45 ++++++++---
 swh/scheduler/tests/test_cli_origin.py | 112 +++++++++++++++++++++++++
 swh/scheduler/tests/test_model.py      |  31 ++++++-
 swh/scheduler/tests/test_scheduler.py  |  27 ++++---
 21 files changed, 611 insertions(+), 39 deletions(-)
 create mode 100644 docs/simulator.rst
 create mode 100644 requirements-simulator.txt
 create mode 100644 sql/updates/20.sql
 create mode 100644 swh/scheduler/cli/origin.py
 create mode 100644 swh/scheduler/simulator/__init__.py
 create mode 100644 swh/scheduler/simulator/__main__.py
 create mode 100644 swh/scheduler/tests/test_cli_origin.py
Changes applied before test
commit dfa0aee33500715f47b2e228c5462153d101a5b5
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Wed Jan 13 16:13:01 2021 +0100

    Introduce scaffolding for a scheduler simulator
    
    This simulator will allow us to compare the behavior of the old and new
    schedulers, as well as to test the impact of scheduler policies and their
    parameters on the performance of the Software Heritage archival
    infrastructure as a whole.

commit 9f843eef37313b551a158dfa11aea97e5ef2fc81
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Wed Jan 13 15:31:55 2021 +0100

    Filter origins by visit type when scheduling the next visits
    
    We have separate task queues and workers for each visit type, so it
    makes sense to split this endpoint along these lines too, at least for
    now.

commit 23d1b3c1883c3c955b5dd5ba1cc2270c93e156d6
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Wed Jan 13 15:25:56 2021 +0100

    Reorganize ListedOrigin fixtures to generate multiple visit_types

commit da347f7f4c401a43ec34de76365ad323d0ff7b77
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Tue Jan 12 17:10:39 2021 +0100

    Introduce a `swh scheduler origin schedule-next` cli
    
    This creates one-shot tasks in the classic scheduler for the next visits
    to run according to the visit scheduling policy.

commit 42957c9e96e6c7d8070e0b6c786c273e8c1602a0
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Tue Jan 12 17:28:33 2021 +0100

    Rename test task types to names that match real tasks
    
    The success of tests using these task types would depend on the test run
    order, because these task types are (currently) being created by
    swh/scheduler/sql/50-data.sql, but the table is truncated after the
    first test completes.

commit d1393c54da99c45175dd0b6a69734d17fc887960
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Tue Jan 12 16:16:31 2021 +0100

    Introduce a `swh scheduler origin grab-next` cli
    
    This returns, as CSV, the next origins to be visited according to the
    passed scheduling policy.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/115/ for more details.

Build is green

Patch application report for D4856 (id=17217)

Could not rebase; Attempt merge onto a62003397d...

Updating a620033..0cde030
Fast-forward
 .pre-commit-config.yaml                |   1 +
 docs/index.rst                         |   1 +
 docs/simulator.rst                     |  55 +++++++++++++
 mypy.ini                               |   3 +
 requirements-simulator.txt             |   1 +
 setup.py                               |   5 +-
 sql/updates/20.sql                     |   6 ++
 swh/scheduler/backend.py               |   9 ++-
 swh/scheduler/cli/__init__.py          |   7 +-
 swh/scheduler/cli/origin.py            | 142 ++++++++++++++++++++++++++++++++
 swh/scheduler/interface.py             |   8 +-
 swh/scheduler/model.py                 |   9 +++
 swh/scheduler/simulator/__init__.py    | 144 +++++++++++++++++++++++++++++++++
 swh/scheduler/simulator/__main__.py    |  31 +++++++
 swh/scheduler/sql/30-schema.sql        |   2 +-
 swh/scheduler/sql/60-indexes.sql       |   2 +-
 swh/scheduler/tests/common.py          |  10 +--
 swh/scheduler/tests/conftest.py        |  45 ++++++++---
 swh/scheduler/tests/test_cli_origin.py | 112 +++++++++++++++++++++++++
 swh/scheduler/tests/test_model.py      |  31 ++++++-
 swh/scheduler/tests/test_scheduler.py  |  27 ++++---
 21 files changed, 612 insertions(+), 39 deletions(-)
 create mode 100644 docs/simulator.rst
 create mode 100644 requirements-simulator.txt
 create mode 100644 sql/updates/20.sql
 create mode 100644 swh/scheduler/cli/origin.py
 create mode 100644 swh/scheduler/simulator/__init__.py
 create mode 100644 swh/scheduler/simulator/__main__.py
 create mode 100644 swh/scheduler/tests/test_cli_origin.py
Changes applied before test
commit 0cde0300fbbd0832a8dcca52ea1e04597e75f423
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Wed Jan 13 16:13:01 2021 +0100

    Introduce scaffolding for a scheduler simulator
    
    This simulator will allow us to compare the behavior of the old and new
    schedulers, as well as to test the impact of scheduler policies and their
    parameters on the performance of the Software Heritage archival
    infrastructure as a whole.

commit ca45d40f2a62d4a0f200cabe760ad3a0cda00f89
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Wed Jan 13 15:31:55 2021 +0100

    Filter origins by visit type when scheduling the next visits
    
    We have separate task queues and workers for each visit type, so it
    makes sense to split this endpoint along these lines too, at least for
    now.

commit 59b4cb3f1c7a081e0d28b11d15888d38a9de151e
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Wed Jan 13 15:25:56 2021 +0100

    Reorganize ListedOrigin fixtures to generate multiple visit_types

commit 4f5338f2aba360fed2e524cbcdd23b11bacfb79d
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Tue Jan 12 17:10:39 2021 +0100

    Introduce a `swh scheduler origin schedule-next` cli
    
    This creates one-shot tasks in the classic scheduler for the next visits
    to run according to the visit scheduling policy.

commit 3dd1d5f28d329620a65ee00749d24401b6d8cf00
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Tue Jan 12 17:28:33 2021 +0100

    Rename test task types to names that match real tasks
    
    The success of tests using these task types would depend on the test run
    order, because these task types are (currently) being created by
    swh/scheduler/sql/50-data.sql, but the table is truncated after the
    first test completes.

commit 5d7b002ac403565e348ac8fe4dd56d015cf29cae
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Tue Jan 12 16:16:31 2021 +0100

    Introduce a `swh scheduler origin grab-next` cli
    
    This returns, as CSV, the next origins to be visited according to the
    passed scheduling policy.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/121/ for more details.

Lots of iterative improvements:

  • Introduce scaffolding for a scheduler simulator
  • simulator: Use datetimes instead of a floating point simulated time
  • simulator: Move scheduler into the simulation environment object
  • simulator: generate OriginVisitStatus objects in modeled visits
  • simulator: simulate the scheduler journal client
  • simulator: Remove some debug statements and lower log level
  • simulator: refine origin model to follow an exponential distribution
  • simulator: add support for a basic SimulationReport
  • simulator: add typing for Environment.scheduler
  • simulator: generate more origins in fill_data
  • simulator: tweak simulation environment constants
  • simulator: Make the run time a CLI argument
  • simulator: Split into smaller files in the same package
  • simulator: also fill data for the task-based scheduler
  • simulator: wrap tasks and task events in typechecked objects
  • simulator: Replace attrs with dataclasses for consistency
  • Move the simulator cli to the main cli module
  • simulator: implement a simulator for the "old" task-based scheduler

Build is green

Patch application report for D4856 (id=17281)

Could not rebase; Attempt merge onto a5fb291703...

Updating a5fb291..a4bbd6b
Fast-forward
 .pre-commit-config.yaml                     |   1 +
 docs/index.rst                              |   1 +
 docs/simulator.rst                          |  55 +++++++++++++
 mypy.ini                                    |   6 ++
 requirements-simulator.txt                  |   2 +
 setup.py                                    |   5 +-
 sql/updates/23.sql                          |  71 ++++++++++++++++
 swh/scheduler/cli/__init__.py               |   2 +-
 swh/scheduler/cli/simulator.py              |  57 +++++++++++++
 swh/scheduler/simulator/__init__.py         | 123 ++++++++++++++++++++++++++++
 swh/scheduler/simulator/common.py           | 102 +++++++++++++++++++++++
 swh/scheduler/simulator/origin_scheduler.py |  69 ++++++++++++++++
 swh/scheduler/simulator/origins.py          | 119 +++++++++++++++++++++++++++
 swh/scheduler/simulator/task_scheduler.py   |  77 +++++++++++++++++
 swh/scheduler/sql/30-schema.sql             |   2 +-
 swh/scheduler/sql/40-func.sql               |   6 +-
 16 files changed, 692 insertions(+), 6 deletions(-)
 create mode 100644 docs/simulator.rst
 create mode 100644 requirements-simulator.txt
 create mode 100644 sql/updates/23.sql
 create mode 100644 swh/scheduler/cli/simulator.py
 create mode 100644 swh/scheduler/simulator/__init__.py
 create mode 100644 swh/scheduler/simulator/common.py
 create mode 100644 swh/scheduler/simulator/origin_scheduler.py
 create mode 100644 swh/scheduler/simulator/origins.py
 create mode 100644 swh/scheduler/simulator/task_scheduler.py
Changes applied before test
commit a4bbd6bd914d5854be0830a034f855d05970b009
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 16:33:43 2021 +0100

    simulator: implement a simulator for the "old" task-based scheduler
    
    We extend the Task object with an autogenerated uuid allowing us to
    track the task lifetime between its creation and the generation of visit
    statuses, as the task-based scheduler does.

commit c9cf37ac6290783ec1f043833a887c7e76a0eb9d
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 16:31:42 2021 +0100

    Move the simulator cli to the main cli module

commit b7e09ab024ac33ca4730d83f2a289b669dc784d2
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 15:37:59 2021 +0100

    simulator: Replace attrs with dataclasses for consistency

commit cc734124c942af9498c0cb11799613ac04d17047
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 15:31:41 2021 +0100

    simulator: wrap tasks and task events in typechecked objects
    
    This allows us to extend these objects without redefining a bunch of
    type annotations.

commit c9e915ca69a7e614979172a694149afa361ec88c
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 14:47:33 2021 +0100

    simulator: also fill data for the task-based scheduler

commit a5d0d0aa521819abf46f34ba265975ca1c806222
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jan 15 14:41:05 2021 +0100

    simulator: Split into smaller files in the same package

commit 8c0c94afe407df82be55867a4550350772934aae
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:50:00 2021 +0100

    simulator: Make the run time a CLI argument

commit f49f8c488ef420890e0c94940fd08a8ccf7b5fe4
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:40:16 2021 +0100

    simulator: tweak simulation environment constants

commit 3e2f46120c7c220d347232d788f27cc7cfaaafd7
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:37:00 2021 +0100

    simulator: generate more origins in fill_data

commit be5375b59621189138847ada4d5c6ee71d82e554
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:35:01 2021 +0100

    simulator: add typing for Environment.scheduler

commit 24cd33ea564fd215c0eda55bfe479e3f1374feca
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:00:21 2021 +0100

    simulator: add support for a basic SimulationReport
    
    For now, this collects the runtime of tasks that have run, and gets
    printed at the end of the simulation.

commit 34ebf6af90537ec7864fd1b0d2bb5133a9db4f15
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 11:45:23 2021 +0100

    simulator: refine origin model to follow an exponential distribution
    
    This models origins using a consistent characteristic "time between
    commits" that follows an exponential distribution between 1 second and
    10 years.
    
    From this characteristic time, and feedback from the OriginVisitStats,
    we can generate the expected run time and output status of the next
    visit of that origin.

commit efdd500aca9b886fa5031533cc159a9c469edf75
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 11:43:20 2021 +0100

    simulator: Remove some debug statements and lower log level

commit bff6576d4d1effd0f81380ffce74d9973f5e054f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:17:11 2021 +0100

    simulator: simulate the scheduler journal client

commit 0e894915ad70fcc294b91c15e3f678f2f54c3f8a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:12:38 2021 +0100

    simulator: generate OriginVisitStatus objects in modeled visits
    
    To be able to generate uneventful visits, we would need to store
    the last snapshot seen for a given origin. Instead of storing this
    within the simulator, which would be a concern for large scale
    simulations, we use the scheduler visit cache directly.

commit 315fb880e35361652a2277fcb7d5544e5ae81067
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:09:58 2021 +0100

    simulator: Move scheduler into the simulation environment object
    
    The scheduler is used by a lot of the simulated actors, it makes sense
    to share it all the time.

commit 6efb445060018326e5164b8f3bc6d137c6800fe5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:07:56 2021 +0100

    simulator: Use datetimes instead of a floating point simulated time

commit 54d42dd92f2c40cd7fdeda136ab33e2c1423682f
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Wed Jan 13 16:13:01 2021 +0100

    Introduce scaffolding for a scheduler simulator
    
    This simulator will allow us to compare the behavior of the old and new
    schedulers, as well as to test the impact of scheduler policies and their
    parameters on the performance of the Software Heritage archival
    infrastructure as a whole.

commit d3afd144af1d3fa511cd2ae4cc76a25cc0856cc6
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 15:10:44 2021 +0100

    Use the recorded task end time for the task scheduler feedback loop
    
    This allows us to run "time-warping" simulations without interference
    from the real wall clock time.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/141/ for more details.

Add basic tests for the simulator

Build is green

Patch application report for D4856 (id=17306)

Rebasing onto d3afd144af...

Current branch diff-target is up to date.
Changes applied before test
commit 5a3c8d9bbea4f5ba62c61e98faa8d8d769f8a835
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Mon Jan 18 13:51:35 2021 +0100

    simulator: add basic tests for fill_test_data and run

commit a4bbd6bd914d5854be0830a034f855d05970b009
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 16:33:43 2021 +0100

    simulator: implement a simulator for the "old" task-based scheduler
    
    We extend the Task object with an autogenerated uuid allowing us to
    track the task lifetime between its creation and the generation of visit
    statuses, as the task-based scheduler does.

commit c9cf37ac6290783ec1f043833a887c7e76a0eb9d
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 16:31:42 2021 +0100

    Move the simulator cli to the main cli module

commit b7e09ab024ac33ca4730d83f2a289b669dc784d2
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 15:37:59 2021 +0100

    simulator: Replace attrs with dataclasses for consistency

commit cc734124c942af9498c0cb11799613ac04d17047
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 15:31:41 2021 +0100

    simulator: wrap tasks and task events in typechecked objects
    
    This allows us to extend these objects without redefining a bunch of
    type annotations.

commit c9e915ca69a7e614979172a694149afa361ec88c
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 14:47:33 2021 +0100

    simulator: also fill data for the task-based scheduler

commit a5d0d0aa521819abf46f34ba265975ca1c806222
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jan 15 14:41:05 2021 +0100

    simulator: Split into smaller files in the same package

commit 8c0c94afe407df82be55867a4550350772934aae
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:50:00 2021 +0100

    simulator: Make the run time a CLI argument

commit f49f8c488ef420890e0c94940fd08a8ccf7b5fe4
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:40:16 2021 +0100

    simulator: tweak simulation environment constants

commit 3e2f46120c7c220d347232d788f27cc7cfaaafd7
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:37:00 2021 +0100

    simulator: generate more origins in fill_data

commit be5375b59621189138847ada4d5c6ee71d82e554
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:35:01 2021 +0100

    simulator: add typing for Environment.scheduler

commit 24cd33ea564fd215c0eda55bfe479e3f1374feca
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:00:21 2021 +0100

    simulator: add support for a basic SimulationReport
    
    For now, this collects the runtime of tasks that have run, and gets
    printed at the end of the simulation.

commit 34ebf6af90537ec7864fd1b0d2bb5133a9db4f15
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 11:45:23 2021 +0100

    simulator: refine origin model to follow an exponential distribution
    
    This models origins using a consistent characteristic "time between
    commits" that follows an exponential distribution between 1 second and
    10 years.
    
    From this characteristic time, and feedback from the OriginVisitStats,
    we can generate the expected run time and output status of the next
    visit of that origin.

commit efdd500aca9b886fa5031533cc159a9c469edf75
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 11:43:20 2021 +0100

    simulator: Remove some debug statements and lower log level

commit bff6576d4d1effd0f81380ffce74d9973f5e054f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:17:11 2021 +0100

    simulator: simulate the scheduler journal client

commit 0e894915ad70fcc294b91c15e3f678f2f54c3f8a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:12:38 2021 +0100

    simulator: generate OriginVisitStatus objects in modeled visits
    
    To be able to generate uneventful visits, we would need to store
    the last snapshot seen for a given origin. Instead of storing this
    within the simulator, which would be a concern for large scale
    simulations, we use the scheduler visit cache directly.

commit 315fb880e35361652a2277fcb7d5544e5ae81067
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:09:58 2021 +0100

    simulator: Move scheduler into the simulation environment object
    
    The scheduler is used by a lot of the simulated actors, it makes sense
    to share it all the time.

commit 6efb445060018326e5164b8f3bc6d137c6800fe5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:07:56 2021 +0100

    simulator: Use datetimes instead of a floating point simulated time

commit 54d42dd92f2c40cd7fdeda136ab33e2c1423682f
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Wed Jan 13 16:13:01 2021 +0100

    Introduce scaffolding for a scheduler simulator
    
    This simulator will allow us to compare the behavior of the old and new
    schedulers, as well as to test the impact of scheduler policies and their
    parameters on the performance of the Software Heritage archival
    infrastructure as a whole.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/142/ for more details.

Build is green

Patch application report for D4856 (id=17336)

Rebasing onto 5e609d5205...

Current branch diff-target is up to date.
Changes applied before test
commit 77362633b7485bfe3944d8c278d509eb60f0d664
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Mon Jan 18 13:51:35 2021 +0100

    simulator: add basic tests for fill_test_data and run

commit eb7676ea2e8dcc5fa92067ad7858e5069ccc8db1
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 16:33:43 2021 +0100

    simulator: implement a simulator for the "old" task-based scheduler
    
    We extend the Task object with an autogenerated uuid allowing us to
    track the task lifetime between its creation and the generation of visit
    statuses, as the task-based scheduler does.

commit 687e6f007cb4943ef19ff87b87953607c6f206b7
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 16:31:42 2021 +0100

    Move the simulator cli to the main cli module

commit c3f520abc55c7355dbef0d2fed1102cc30040176
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 15:37:59 2021 +0100

    simulator: Replace attrs with dataclasses for consistency

commit 58042267faeaaab656c1e459b14fcfa24f300795
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 15:31:41 2021 +0100

    simulator: wrap tasks and task events in typechecked objects
    
    This allows us to extend these objects without redefining a bunch of
    type annotations.

commit b58eb740e9d407b95a1df632eaef91bbe6c3ff8b
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 14:47:33 2021 +0100

    simulator: also fill data for the task-based scheduler

commit 85df218106a0c29dd79900321572e87a7c90a5bd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jan 15 14:41:05 2021 +0100

    simulator: Split into smaller files in the same package

commit 2bc5187c76657b00d54b61f993aeeb2de25acf18
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:50:00 2021 +0100

    simulator: Make the run time a CLI argument

commit edf406dc961c8d0a77a34e73b0e19fcd511bd27d
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:40:16 2021 +0100

    simulator: tweak simulation environment constants

commit e7d60a996249b6827332e17e2977bec1b69eab83
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:37:00 2021 +0100

    simulator: generate more origins in fill_data

commit b663e5414a09bc0b5a22c111894433d71c77f42c
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:35:01 2021 +0100

    simulator: add typing for Environment.scheduler

commit c3e8380e1aa140c8823ef76ba6d384474f160c9b
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:00:21 2021 +0100

    simulator: add support for a basic SimulationReport
    
    For now, this collects the runtime of tasks that have run, and gets
    printed at the end of the simulation.

commit b4b20ad406d8925cb4aa96828dbc5af14e0bda8d
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 11:45:23 2021 +0100

    simulator: refine origin model to follow an exponential distribution
    
    This models origins using a consistent characteristic "time between
    commits" that follows an exponential distribution between 1 second and
    10 years.
    
    From this characteristic time, and feedback from the OriginVisitStats,
    we can generate the expected run time and output status of the next
    visit of that origin.

commit cce6ce250ee0e73cc2b486c32cae8c05265a9974
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 11:43:20 2021 +0100

    simulator: Remove some debug statements and lower log level

commit 7e5f99837487c3785dfa96ed28ce9fecdf25bad8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:17:11 2021 +0100

    simulator: simulate the scheduler journal client

commit 63c3beea168e2f41ff0cbd71fe53af95e062748a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:12:38 2021 +0100

    simulator: generate OriginVisitStatus objects in modeled visits
    
    To be able to generate uneventful visits, we would need to store
    the last snapshot seen for a given origin. Instead of storing this
    within the simulator, which would be a concern for large scale
    simulations, we use the scheduler visit cache directly.

commit dd06b1bd428c15cf8ebb89873f24ee372ff363eb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:09:58 2021 +0100

    simulator: Move scheduler into the simulation environment object
    
    The scheduler is used by a lot of the simulated actors, it makes sense
    to share it all the time.

commit 524ec4a50a60eb45815faf49d8d675a86756955b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:07:56 2021 +0100

    simulator: Use datetimes instead of a floating point simulated time

commit 11263f58a02c9f1aa485df5ea4ac5131998f3d69
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Wed Jan 13 16:13:01 2021 +0100

    Introduce scaffolding for a scheduler simulator
    
    This simulator will allow us to compare the behavior of the old and new
    schedulers, as well as to test the impact of scheduler policies and their
    parameters on the performance of the Software Heritage archival
    infrastructure as a whole.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/155/ for more details.

douardda added inline comments.
docs/simulator.rst
17

this list of items is not very clear. May be rephrased a bit for better clarity. Especially the last one on the feedback loop

swh/scheduler/cli/simulator.py
38 ↗(On Diff #17336)

unclear what this "scheduler" option actually refers to. Is 'task_scheduler' the current ("legacy") one? And "origin_scheduler" the first simple implementation recently added?

45 ↗(On Diff #17336)

how does the "policy" option interact with the "scheduler" above?

swh/scheduler/simulator/__init__.py
63

ok now I see...

swh/scheduler/cli/simulator.py
38 ↗(On Diff #17336)

yes. @olasd even named it legacy_scheduler originally, but we aren't going to deprecated it, so I don't think it's a good name

swh/scheduler/simulator/__init__.py
21

it would probably be nice to add a docstring/comment that gives an overall description of how this simulator works

swh/scheduler/simulator/origins.py
37 ↗(On Diff #17336)

I'm not sure I get how this method is supposed to be called. Is it once and only once? or it it called each time an "next commit date for this origin" event is triggered (if that make sense)?

I mean the method name suggest it gives a definitive mean time between commits. Is this it?

42 ↗(On Diff #17336)

this is not so easy to read and get (for someone like me at least)... I'd really appreciate a more comprehensive/explanatory comment here...

overall looks good to me, but it could benefit from more comments and explanations. Not easy to get in as is.

swh/scheduler/simulator/task_scheduler.py
26 ↗(On Diff #17336)

why the 10 factor?

This revision is now accepted and ready to land.Tue, Jan 19, 3:31 PM
swh/scheduler/simulator/task_scheduler.py
26 ↗(On Diff #17336)

it's completely arbitrary

swh/scheduler/simulator/task_scheduler.py
26 ↗(On Diff #17336)

then add a comment about it

Address @douardda's comments:

  • simulator: Make min_batch_size a parameter defined in the setup.
  • simulator: Add documentation.
vlorentz added inline comments.
swh/scheduler/simulator/task_scheduler.py
26 ↗(On Diff #17336)

better yet: I'll make it a constant somewhere else

Build is green

Patch application report for D4856 (id=17348)

Rebasing onto 5e609d5205...

Current branch diff-target is up to date.
Changes applied before test
commit 7af4bebc7a4503964f9bd61ac101c54fc42ca474
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jan 19 16:32:27 2021 +0100

    simulator: Add documentation.

commit 1967379c3251f407e7e5128efbbceafe293e3704
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jan 19 16:17:24 2021 +0100

    simulator: Make min_batch_size a parameter defined in the setup.

commit 77362633b7485bfe3944d8c278d509eb60f0d664
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Mon Jan 18 13:51:35 2021 +0100

    simulator: add basic tests for fill_test_data and run

commit eb7676ea2e8dcc5fa92067ad7858e5069ccc8db1
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 16:33:43 2021 +0100

    simulator: implement a simulator for the "old" task-based scheduler
    
    We extend the Task object with an autogenerated uuid allowing us to
    track the task lifetime between its creation and the generation of visit
    statuses, as the task-based scheduler does.

commit 687e6f007cb4943ef19ff87b87953607c6f206b7
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 16:31:42 2021 +0100

    Move the simulator cli to the main cli module

commit c3f520abc55c7355dbef0d2fed1102cc30040176
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 15:37:59 2021 +0100

    simulator: Replace attrs with dataclasses for consistency

commit 58042267faeaaab656c1e459b14fcfa24f300795
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 15:31:41 2021 +0100

    simulator: wrap tasks and task events in typechecked objects
    
    This allows us to extend these objects without redefining a bunch of
    type annotations.

commit b58eb740e9d407b95a1df632eaef91bbe6c3ff8b
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 14:47:33 2021 +0100

    simulator: also fill data for the task-based scheduler

commit 85df218106a0c29dd79900321572e87a7c90a5bd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jan 15 14:41:05 2021 +0100

    simulator: Split into smaller files in the same package

commit 2bc5187c76657b00d54b61f993aeeb2de25acf18
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:50:00 2021 +0100

    simulator: Make the run time a CLI argument

commit edf406dc961c8d0a77a34e73b0e19fcd511bd27d
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:40:16 2021 +0100

    simulator: tweak simulation environment constants

commit e7d60a996249b6827332e17e2977bec1b69eab83
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:37:00 2021 +0100

    simulator: generate more origins in fill_data

commit b663e5414a09bc0b5a22c111894433d71c77f42c
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:35:01 2021 +0100

    simulator: add typing for Environment.scheduler

commit c3e8380e1aa140c8823ef76ba6d384474f160c9b
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:00:21 2021 +0100

    simulator: add support for a basic SimulationReport
    
    For now, this collects the runtime of tasks that have run, and gets
    printed at the end of the simulation.

commit b4b20ad406d8925cb4aa96828dbc5af14e0bda8d
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 11:45:23 2021 +0100

    simulator: refine origin model to follow an exponential distribution
    
    This models origins using a consistent characteristic "time between
    commits" that follows an exponential distribution between 1 second and
    10 years.
    
    From this characteristic time, and feedback from the OriginVisitStats,
    we can generate the expected run time and output status of the next
    visit of that origin.

commit cce6ce250ee0e73cc2b486c32cae8c05265a9974
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 11:43:20 2021 +0100

    simulator: Remove some debug statements and lower log level

commit 7e5f99837487c3785dfa96ed28ce9fecdf25bad8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:17:11 2021 +0100

    simulator: simulate the scheduler journal client

commit 63c3beea168e2f41ff0cbd71fe53af95e062748a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:12:38 2021 +0100

    simulator: generate OriginVisitStatus objects in modeled visits
    
    To be able to generate uneventful visits, we would need to store
    the last snapshot seen for a given origin. Instead of storing this
    within the simulator, which would be a concern for large scale
    simulations, we use the scheduler visit cache directly.

commit dd06b1bd428c15cf8ebb89873f24ee372ff363eb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:09:58 2021 +0100

    simulator: Move scheduler into the simulation environment object
    
    The scheduler is used by a lot of the simulated actors, it makes sense
    to share it all the time.

commit 524ec4a50a60eb45815faf49d8d675a86756955b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:07:56 2021 +0100

    simulator: Use datetimes instead of a floating point simulated time

commit 11263f58a02c9f1aa485df5ea4ac5131998f3d69
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Wed Jan 13 16:13:01 2021 +0100

    Introduce scaffolding for a scheduler simulator
    
    This simulator will allow us to compare the behavior of the old and new
    schedulers, as well as to test the impact of scheduler policies and their
    parameters on the performance of the Software Heritage archival
    infrastructure as a whole.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/161/ for more details.

vlorentz marked an inline comment as done.

even better doc

Build is green

Patch application report for D4856 (id=17350)

Rebasing onto 5e609d5205...

Current branch diff-target is up to date.
Changes applied before test
commit b594c847826699608f49afe4153c2f2b3ef99657
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jan 19 16:32:27 2021 +0100

    simulator: Add documentation.

commit 1967379c3251f407e7e5128efbbceafe293e3704
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jan 19 16:17:24 2021 +0100

    simulator: Make min_batch_size a parameter defined in the setup.

commit 77362633b7485bfe3944d8c278d509eb60f0d664
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Mon Jan 18 13:51:35 2021 +0100

    simulator: add basic tests for fill_test_data and run

commit eb7676ea2e8dcc5fa92067ad7858e5069ccc8db1
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 16:33:43 2021 +0100

    simulator: implement a simulator for the "old" task-based scheduler
    
    We extend the Task object with an autogenerated uuid allowing us to
    track the task lifetime between its creation and the generation of visit
    statuses, as the task-based scheduler does.

commit 687e6f007cb4943ef19ff87b87953607c6f206b7
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 16:31:42 2021 +0100

    Move the simulator cli to the main cli module

commit c3f520abc55c7355dbef0d2fed1102cc30040176
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 15:37:59 2021 +0100

    simulator: Replace attrs with dataclasses for consistency

commit 58042267faeaaab656c1e459b14fcfa24f300795
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 15:31:41 2021 +0100

    simulator: wrap tasks and task events in typechecked objects
    
    This allows us to extend these objects without redefining a bunch of
    type annotations.

commit b58eb740e9d407b95a1df632eaef91bbe6c3ff8b
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 14:47:33 2021 +0100

    simulator: also fill data for the task-based scheduler

commit 85df218106a0c29dd79900321572e87a7c90a5bd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jan 15 14:41:05 2021 +0100

    simulator: Split into smaller files in the same package

commit 2bc5187c76657b00d54b61f993aeeb2de25acf18
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:50:00 2021 +0100

    simulator: Make the run time a CLI argument

commit edf406dc961c8d0a77a34e73b0e19fcd511bd27d
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:40:16 2021 +0100

    simulator: tweak simulation environment constants

commit e7d60a996249b6827332e17e2977bec1b69eab83
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:37:00 2021 +0100

    simulator: generate more origins in fill_data

commit b663e5414a09bc0b5a22c111894433d71c77f42c
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:35:01 2021 +0100

    simulator: add typing for Environment.scheduler

commit c3e8380e1aa140c8823ef76ba6d384474f160c9b
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:00:21 2021 +0100

    simulator: add support for a basic SimulationReport
    
    For now, this collects the runtime of tasks that have run, and gets
    printed at the end of the simulation.

commit b4b20ad406d8925cb4aa96828dbc5af14e0bda8d
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 11:45:23 2021 +0100

    simulator: refine origin model to follow an exponential distribution
    
    This models origins using a consistent characteristic "time between
    commits" that follows an exponential distribution between 1 second and
    10 years.
    
    From this characteristic time, and feedback from the OriginVisitStats,
    we can generate the expected run time and output status of the next
    visit of that origin.

commit cce6ce250ee0e73cc2b486c32cae8c05265a9974
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 11:43:20 2021 +0100

    simulator: Remove some debug statements and lower log level

commit 7e5f99837487c3785dfa96ed28ce9fecdf25bad8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:17:11 2021 +0100

    simulator: simulate the scheduler journal client

commit 63c3beea168e2f41ff0cbd71fe53af95e062748a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:12:38 2021 +0100

    simulator: generate OriginVisitStatus objects in modeled visits
    
    To be able to generate uneventful visits, we would need to store
    the last snapshot seen for a given origin. Instead of storing this
    within the simulator, which would be a concern for large scale
    simulations, we use the scheduler visit cache directly.

commit dd06b1bd428c15cf8ebb89873f24ee372ff363eb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:09:58 2021 +0100

    simulator: Move scheduler into the simulation environment object
    
    The scheduler is used by a lot of the simulated actors, it makes sense
    to share it all the time.

commit 524ec4a50a60eb45815faf49d8d675a86756955b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:07:56 2021 +0100

    simulator: Use datetimes instead of a floating point simulated time

commit 11263f58a02c9f1aa485df5ea4ac5131998f3d69
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Wed Jan 13 16:13:01 2021 +0100

    Introduce scaffolding for a scheduler simulator
    
    This simulator will allow us to compare the behavior of the old and new
    schedulers, as well as to test the impact of scheduler policies and their
    parameters on the performance of the Software Heritage archival
    infrastructure as a whole.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/162/ for more details.

Build is green

Patch application report for D4856 (id=17356)

Could not rebase; Attempt merge onto 0a32a31195...

Auto-merging setup.py
Merge made by the 'recursive' strategy.
 .pre-commit-config.yaml                     |   1 +
 docs/index.rst                              |   1 +
 docs/simulator.rst                          |  65 +++++++++++++
 mypy.ini                                    |   6 ++
 requirements-simulator.txt                  |   2 +
 setup.py                                    |  34 +++----
 swh/scheduler/cli/__init__.py               |   2 +-
 swh/scheduler/cli/simulator.py              |  61 ++++++++++++
 swh/scheduler/simulator/__init__.py         | 144 ++++++++++++++++++++++++++++
 swh/scheduler/simulator/common.py           | 102 ++++++++++++++++++++
 swh/scheduler/simulator/origin_scheduler.py |  68 +++++++++++++
 swh/scheduler/simulator/origins.py          | 128 +++++++++++++++++++++++++
 swh/scheduler/simulator/task_scheduler.py   |  76 +++++++++++++++
 swh/scheduler/tests/test_simulator.py       |  45 +++++++++
 14 files changed, 718 insertions(+), 17 deletions(-)
 create mode 100644 docs/simulator.rst
 create mode 100644 requirements-simulator.txt
 create mode 100644 swh/scheduler/cli/simulator.py
 create mode 100644 swh/scheduler/simulator/__init__.py
 create mode 100644 swh/scheduler/simulator/common.py
 create mode 100644 swh/scheduler/simulator/origin_scheduler.py
 create mode 100644 swh/scheduler/simulator/origins.py
 create mode 100644 swh/scheduler/simulator/task_scheduler.py
 create mode 100644 swh/scheduler/tests/test_simulator.py
Changes applied before test
commit 69877d3f987eaba00dcc97359b48e1fc8a677298
Merge: 0a32a31 ed04441
Author: Jenkins user <jenkins@localhost>
Date:   Tue Jan 19 17:06:09 2021 +0000

    Merge branch 'diff-target' into HEAD

commit ed044415e625080cb4bc67b2656743d92ed4c884
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jan 19 16:32:27 2021 +0100

    simulator: Add documentation.

commit 186aebeb12905dc98cc370b360a8b3f5c4db3186
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jan 19 16:17:24 2021 +0100

    simulator: Make min_batch_size a parameter defined in the setup.

commit 6150c764616d3c25ee13eb08bea6b4c9d1c2bc0d
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Mon Jan 18 13:51:35 2021 +0100

    simulator: add basic tests for fill_test_data and run

commit 5bee207dca74bb2c70611b3308c93bc522d48247
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 16:33:43 2021 +0100

    simulator: implement a simulator for the "old" task-based scheduler
    
    We extend the Task object with an autogenerated uuid allowing us to
    track the task lifetime between its creation and the generation of visit
    statuses, as the task-based scheduler does.

commit 6ec79c18b7b0e8be7b086aa79e87de81a8dbd06a
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 16:31:42 2021 +0100

    Move the simulator cli to the main cli module

commit b25874a7066c95460f7d24c132f32f4dabf055a7
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 15:37:59 2021 +0100

    simulator: Replace attrs with dataclasses for consistency

commit cb0bc27be55cf384c68b834ae3c89dd93434fbba
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 15:31:41 2021 +0100

    simulator: wrap tasks and task events in typechecked objects
    
    This allows us to extend these objects without redefining a bunch of
    type annotations.

commit 947aecb14cdb4c6dd2da178f53599b4a41c8245b
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 14:47:33 2021 +0100

    simulator: also fill data for the task-based scheduler

commit ff6cd0669e0d75afbd2c63424db66bf8d1e91bee
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jan 15 14:41:05 2021 +0100

    simulator: Split into smaller files in the same package

commit b232135cb982f4fc8e5fb6242a88012d732e252d
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:50:00 2021 +0100

    simulator: Make the run time a CLI argument

commit 24e93d8aa72107bf953f884df4c9b15ea9cbeb2c
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:40:16 2021 +0100

    simulator: tweak simulation environment constants

commit 9885d12cd708a26878cd9aa70ab590223589e8d7
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:37:00 2021 +0100

    simulator: generate more origins in fill_data

commit a1d80fec0f5760d136857fb893232b1baec35b64
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:35:01 2021 +0100

    simulator: add typing for Environment.scheduler

commit 6a9ec5f38133fe232da1ca98ff30ef44b12a4c12
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:00:21 2021 +0100

    simulator: add support for a basic SimulationReport
    
    For now, this collects the runtime of tasks that have run, and gets
    printed at the end of the simulation.

commit 5d0e2aee4182df9476934349ad20da5dafc8b61f
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 11:45:23 2021 +0100

    simulator: refine origin model to follow an exponential distribution
    
    This models origins using a consistent characteristic "time between
    commits" that follows an exponential distribution between 1 second and
    10 years.
    
    From this characteristic time, and feedback from the OriginVisitStats,
    we can generate the expected run time and output status of the next
    visit of that origin.

commit 7934c2f90191615db69b50dc27744ec73704f896
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 11:43:20 2021 +0100

    simulator: Remove some debug statements and lower log level

commit d0ed751eca9f2ff0464b795edb9e9bb2a0305649
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:17:11 2021 +0100

    simulator: simulate the scheduler journal client

commit c9b0728955e683748f8b03a22f91d501b64aad67
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:12:38 2021 +0100

    simulator: generate OriginVisitStatus objects in modeled visits
    
    To be able to generate uneventful visits, we would need to store
    the last snapshot seen for a given origin. Instead of storing this
    within the simulator, which would be a concern for large scale
    simulations, we use the scheduler visit cache directly.

commit 56c8d1dd66d8a993c8bc7c7bcc4e3fb3704f6864
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:09:58 2021 +0100

    simulator: Move scheduler into the simulation environment object
    
    The scheduler is used by a lot of the simulated actors, it makes sense
    to share it all the time.

commit 1659aa17fe0510030fb24d3b7867d2c4a366b5dd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:07:56 2021 +0100

    simulator: Use datetimes instead of a floating point simulated time

commit ef241dd84c400f9be0d92396867587d47216e385
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Wed Jan 13 16:13:01 2021 +0100

    Introduce scaffolding for a scheduler simulator
    
    This simulator will allow us to compare the behavior of the old and new
    schedulers, as well as to test the impact of scheduler policies and their
    parameters on the performance of the Software Heritage archival
    infrastructure as a whole.

commit 49a14792b0329049b51cbc6ed9c48006e9ff1a73
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Tue Jan 19 17:56:44 2021 +0100

    Import the journal subcommand in the main swh.scheduler cli
    
    This issue was masked by tox.ini using pytest with --doctest-modules,
    which imports all modules during test collection, and therefore executing
    the side-effects of swh.scheduler.cli.journal.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/165/ for more details.

Stop using get_scheduler directly

Build is green

Patch application report for D4856 (id=17362)

Could not rebase; Attempt merge onto 0a32a31195...

Auto-merging setup.py
Merge made by the 'recursive' strategy.
 .pre-commit-config.yaml                     |   1 +
 docs/index.rst                              |   1 +
 docs/simulator.rst                          |  65 ++++++++++++
 mypy.ini                                    |   6 ++
 requirements-simulator.txt                  |   2 +
 setup.py                                    |  34 ++++---
 swh/scheduler/cli/__init__.py               |   2 +-
 swh/scheduler/cli/simulator.py              |  68 +++++++++++++
 swh/scheduler/simulator/__init__.py         | 147 ++++++++++++++++++++++++++++
 swh/scheduler/simulator/common.py           | 102 +++++++++++++++++++
 swh/scheduler/simulator/origin_scheduler.py |  68 +++++++++++++
 swh/scheduler/simulator/origins.py          | 128 ++++++++++++++++++++++++
 swh/scheduler/simulator/task_scheduler.py   |  76 ++++++++++++++
 swh/scheduler/tests/test_simulator.py       |  53 ++++++++++
 14 files changed, 736 insertions(+), 17 deletions(-)
 create mode 100644 docs/simulator.rst
 create mode 100644 requirements-simulator.txt
 create mode 100644 swh/scheduler/cli/simulator.py
 create mode 100644 swh/scheduler/simulator/__init__.py
 create mode 100644 swh/scheduler/simulator/common.py
 create mode 100644 swh/scheduler/simulator/origin_scheduler.py
 create mode 100644 swh/scheduler/simulator/origins.py
 create mode 100644 swh/scheduler/simulator/task_scheduler.py
 create mode 100644 swh/scheduler/tests/test_simulator.py
Changes applied before test
commit c4641ac0f67e92b7d6ffe885f9fc7a410a547e63
Merge: 0a32a31 e12a4f1
Author: Jenkins user <jenkins@localhost>
Date:   Tue Jan 19 17:42:37 2021 +0000

    Merge branch 'diff-target' into HEAD

commit e12a4f13386cdb25d366f5e2ee81044cb8e30169
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jan 19 18:36:53 2021 +0100

    simulator: stop using get_scheduler directly
    
    This reuses the scheduler instantiated by the cli instead of hardcoding
    our own using the PG* variables.

commit ed044415e625080cb4bc67b2656743d92ed4c884
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jan 19 16:32:27 2021 +0100

    simulator: Add documentation.

commit 186aebeb12905dc98cc370b360a8b3f5c4db3186
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jan 19 16:17:24 2021 +0100

    simulator: Make min_batch_size a parameter defined in the setup.

commit 6150c764616d3c25ee13eb08bea6b4c9d1c2bc0d
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Mon Jan 18 13:51:35 2021 +0100

    simulator: add basic tests for fill_test_data and run

commit 5bee207dca74bb2c70611b3308c93bc522d48247
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 16:33:43 2021 +0100

    simulator: implement a simulator for the "old" task-based scheduler
    
    We extend the Task object with an autogenerated uuid allowing us to
    track the task lifetime between its creation and the generation of visit
    statuses, as the task-based scheduler does.

commit 6ec79c18b7b0e8be7b086aa79e87de81a8dbd06a
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 16:31:42 2021 +0100

    Move the simulator cli to the main cli module

commit b25874a7066c95460f7d24c132f32f4dabf055a7
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 15:37:59 2021 +0100

    simulator: Replace attrs with dataclasses for consistency

commit cb0bc27be55cf384c68b834ae3c89dd93434fbba
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 15:31:41 2021 +0100

    simulator: wrap tasks and task events in typechecked objects
    
    This allows us to extend these objects without redefining a bunch of
    type annotations.

commit 947aecb14cdb4c6dd2da178f53599b4a41c8245b
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 14:47:33 2021 +0100

    simulator: also fill data for the task-based scheduler

commit ff6cd0669e0d75afbd2c63424db66bf8d1e91bee
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jan 15 14:41:05 2021 +0100

    simulator: Split into smaller files in the same package

commit b232135cb982f4fc8e5fb6242a88012d732e252d
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:50:00 2021 +0100

    simulator: Make the run time a CLI argument

commit 24e93d8aa72107bf953f884df4c9b15ea9cbeb2c
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:40:16 2021 +0100

    simulator: tweak simulation environment constants

commit 9885d12cd708a26878cd9aa70ab590223589e8d7
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:37:00 2021 +0100

    simulator: generate more origins in fill_data

commit a1d80fec0f5760d136857fb893232b1baec35b64
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:35:01 2021 +0100

    simulator: add typing for Environment.scheduler

commit 6a9ec5f38133fe232da1ca98ff30ef44b12a4c12
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:00:21 2021 +0100

    simulator: add support for a basic SimulationReport
    
    For now, this collects the runtime of tasks that have run, and gets
    printed at the end of the simulation.

commit 5d0e2aee4182df9476934349ad20da5dafc8b61f
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 11:45:23 2021 +0100

    simulator: refine origin model to follow an exponential distribution
    
    This models origins using a consistent characteristic "time between
    commits" that follows an exponential distribution between 1 second and
    10 years.
    
    From this characteristic time, and feedback from the OriginVisitStats,
    we can generate the expected run time and output status of the next
    visit of that origin.

commit 7934c2f90191615db69b50dc27744ec73704f896
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 11:43:20 2021 +0100

    simulator: Remove some debug statements and lower log level

commit d0ed751eca9f2ff0464b795edb9e9bb2a0305649
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:17:11 2021 +0100

    simulator: simulate the scheduler journal client

commit c9b0728955e683748f8b03a22f91d501b64aad67
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:12:38 2021 +0100

    simulator: generate OriginVisitStatus objects in modeled visits
    
    To be able to generate uneventful visits, we would need to store
    the last snapshot seen for a given origin. Instead of storing this
    within the simulator, which would be a concern for large scale
    simulations, we use the scheduler visit cache directly.

commit 56c8d1dd66d8a993c8bc7c7bcc4e3fb3704f6864
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:09:58 2021 +0100

    simulator: Move scheduler into the simulation environment object
    
    The scheduler is used by a lot of the simulated actors, it makes sense
    to share it all the time.

commit 1659aa17fe0510030fb24d3b7867d2c4a366b5dd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:07:56 2021 +0100

    simulator: Use datetimes instead of a floating point simulated time

commit ef241dd84c400f9be0d92396867587d47216e385
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Wed Jan 13 16:13:01 2021 +0100

    Introduce scaffolding for a scheduler simulator
    
    This simulator will allow us to compare the behavior of the old and new
    schedulers, as well as to test the impact of scheduler policies and their
    parameters on the performance of the Software Heritage archival
    infrastructure as a whole.

commit 49a14792b0329049b51cbc6ed9c48006e9ff1a73
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Tue Jan 19 17:56:44 2021 +0100

    Import the journal subcommand in the main swh.scheduler cli
    
    This issue was masked by tox.ini using pytest with --doctest-modules,
    which imports all modules during test collection, and therefore executing
    the side-effects of swh.scheduler.cli.journal.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/171/ for more details.

Build is green

Patch application report for D4856 (id=17381)

Could not rebase; Attempt merge onto 98526539a8...

Updating 9852653..89b8839
Fast-forward
 .pre-commit-config.yaml                     |   1 +
 docs/index.rst                              |   1 +
 docs/simulator.rst                          |  65 ++++++++++
 mypy.ini                                    |   6 +
 requirements-simulator.txt                  |   2 +
 setup.py                                    |  34 +++---
 sql/updates/25.sql                          |  64 ++++++++++
 swh/scheduler/backend.py                    |  87 +++++++++++++
 swh/scheduler/cli/__init__.py               |   2 +-
 swh/scheduler/cli/origin.py                 |  40 ++++++
 swh/scheduler/cli/simulator.py              |  68 +++++++++++
 swh/scheduler/interface.py                  |  40 ++++++
 swh/scheduler/model.py                      |  32 +++++
 swh/scheduler/simulator/__init__.py         | 147 ++++++++++++++++++++++
 swh/scheduler/simulator/common.py           | 102 ++++++++++++++++
 swh/scheduler/simulator/origin_scheduler.py |  68 +++++++++++
 swh/scheduler/simulator/origins.py          | 128 ++++++++++++++++++++
 swh/scheduler/simulator/task_scheduler.py   |  76 ++++++++++++
 swh/scheduler/sql/30-schema.sql             |  24 +++-
 swh/scheduler/sql/40-func.sql               |  40 ++++++
 swh/scheduler/tests/test_api_client.py      |   3 +
 swh/scheduler/tests/test_cli_origin.py      |  11 ++
 swh/scheduler/tests/test_scheduler.py       | 181 +++++++++++++++++++++++++++-
 swh/scheduler/tests/test_simulator.py       |  53 ++++++++
 24 files changed, 1255 insertions(+), 20 deletions(-)
 create mode 100644 docs/simulator.rst
 create mode 100644 requirements-simulator.txt
 create mode 100644 sql/updates/25.sql
 create mode 100644 swh/scheduler/cli/simulator.py
 create mode 100644 swh/scheduler/simulator/__init__.py
 create mode 100644 swh/scheduler/simulator/common.py
 create mode 100644 swh/scheduler/simulator/origin_scheduler.py
 create mode 100644 swh/scheduler/simulator/origins.py
 create mode 100644 swh/scheduler/simulator/task_scheduler.py
 create mode 100644 swh/scheduler/tests/test_simulator.py
Changes applied before test
commit 89b8839ce5d1e5db2bb9b69c96dbc943d1172ff0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jan 19 18:36:53 2021 +0100

    simulator: stop using get_scheduler directly
    
    This reuses the scheduler instantiated by the cli instead of hardcoding
    our own using the PG* variables.

commit f9f28ece9b78957a7dac050c9d21fe0b0c64ad95
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jan 19 16:32:27 2021 +0100

    simulator: Add documentation.

commit 1b335be22b7ad25eede2ac605f86d2fd80a61b4d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jan 19 16:17:24 2021 +0100

    simulator: Make min_batch_size a parameter defined in the setup.

commit 403e97c5599934aed746f9301845c9e6f0d7d933
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Mon Jan 18 13:51:35 2021 +0100

    simulator: add basic tests for fill_test_data and run

commit cb9b2c1ddb6cf641b3b23fedf7a36269cc4ced6d
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 16:33:43 2021 +0100

    simulator: implement a simulator for the "old" task-based scheduler
    
    We extend the Task object with an autogenerated uuid allowing us to
    track the task lifetime between its creation and the generation of visit
    statuses, as the task-based scheduler does.

commit 44407d7fd413c62070d85f6ee1de2268a87e2906
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 16:31:42 2021 +0100

    Move the simulator cli to the main cli module

commit f951490ccfbdf30c4ef57d0b41651f6f43278873
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 15:37:59 2021 +0100

    simulator: Replace attrs with dataclasses for consistency

commit b9b3defd03f87febb5c06c50ac2b7c9d37e918d5
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 15:31:41 2021 +0100

    simulator: wrap tasks and task events in typechecked objects
    
    This allows us to extend these objects without redefining a bunch of
    type annotations.

commit b4b83f6e15476f93c51d68adbbfdbbb10d71d444
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 14:47:33 2021 +0100

    simulator: also fill data for the task-based scheduler

commit 029c95f8887cac6d0eeabb4516812371375dbd28
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jan 15 14:41:05 2021 +0100

    simulator: Split into smaller files in the same package

commit 131994c324502f455080603fb8ebda0e77feba22
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:50:00 2021 +0100

    simulator: Make the run time a CLI argument

commit 4e54c277a3a3faa3399b615b096bcb7149a5ff78
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:40:16 2021 +0100

    simulator: tweak simulation environment constants

commit 631955aaccdb8a6f2cbdc2881ce70553c1d437e0
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:37:00 2021 +0100

    simulator: generate more origins in fill_data

commit f8bdbec28238cdf9c487ae7ed1cc24cbfbdffdb3
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:35:01 2021 +0100

    simulator: add typing for Environment.scheduler

commit b97bb855576e1edf23be70993b6df54dc0f16a6f
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:00:21 2021 +0100

    simulator: add support for a basic SimulationReport
    
    For now, this collects the runtime of tasks that have run, and gets
    printed at the end of the simulation.

commit d50da6b64b1242d226dafbfc032184c8e5fb1c9f
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 11:45:23 2021 +0100

    simulator: refine origin model to follow an exponential distribution
    
    This models origins using a consistent characteristic "time between
    commits" that follows an exponential distribution between 1 second and
    10 years.
    
    From this characteristic time, and feedback from the OriginVisitStats,
    we can generate the expected run time and output status of the next
    visit of that origin.

commit fd44eb75447aba9a03b43621b88f140d8dc15ec1
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 11:43:20 2021 +0100

    simulator: Remove some debug statements and lower log level

commit 393313a7b5530a3f123e9ca7e92fe9d61038d829
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:17:11 2021 +0100

    simulator: simulate the scheduler journal client

commit 1f1e6c5d5157ee8f30b8c56a1cf130ac5ef4e953
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:12:38 2021 +0100

    simulator: generate OriginVisitStatus objects in modeled visits
    
    To be able to generate uneventful visits, we would need to store
    the last snapshot seen for a given origin. Instead of storing this
    within the simulator, which would be a concern for large scale
    simulations, we use the scheduler visit cache directly.

commit 62870d9d11e3f598130f2562181dc8a59b7e2e2d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:09:58 2021 +0100

    simulator: Move scheduler into the simulation environment object
    
    The scheduler is used by a lot of the simulated actors, it makes sense
    to share it all the time.

commit 89c76bd7e776f5dadf8b3ff13b9bd5d5cc42f208
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:07:56 2021 +0100

    simulator: Use datetimes instead of a floating point simulated time

commit ccf03c4e1f9bd3b1e46a1de0bfc7c7e4b055284d
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Wed Jan 13 16:13:01 2021 +0100

    Introduce scaffolding for a scheduler simulator
    
    This simulator will allow us to compare the behavior of the old and new
    schedulers, as well as to test the impact of scheduler policies and their
    parameters on the performance of the Software Heritage archival
    infrastructure as a whole.

commit 53b034cb8d09efa0c9b448d29fb70d727bc6a066
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jan 19 18:39:21 2021 +0100

    Add a cli for the scheduler metrics update endpoint

commit 737d12e5b9e694b22bef291c625090fb3aee2afc
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Tue Jan 19 17:48:31 2021 +0100

    Introduce a new lister_get endpoint

commit 114ed952e513c7ad3dbb038a640e80bf079d0780
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Tue Jan 19 14:23:32 2021 +0100

    Implement some basic aggregated metrics on listed origins
    
    Metrics are computed and cached database-side by the `update_metrics`
    function. The `get_metrics` function only retrieves the cached data.
    
    The metrics are aggregated for each lister instance and visit type
    (allowing complete reaggregation by visit type for cross-cutting statistics).
    
    The following metrics have been implemented:
     - number of known origins overall
     - number of enabled origins (origins seen in the last listing)
     - number of enabled origins that have never been successfully visited
     - number of enabled origins with known activity since our last successful visit

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/181/ for more details.

One last rebase for the road

Build is green

Patch application report for D4856 (id=17407)

Rebasing onto 7905a6bea4...

Current branch diff-target is up to date.
Changes applied before test
commit 898820fac52cf6fcfb5d2770aad49f131370a5a6
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Wed Jan 20 12:11:05 2021 +0100

    simulator: collect and plot scheduler metrics over time
    
    For now, only plot the known_origins and origins_never_visited metrics.

commit 9ce68f8d0e0ea69bd6672a50687079b5b1ea460c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jan 19 18:36:53 2021 +0100

    simulator: stop using get_scheduler directly
    
    This reuses the scheduler instantiated by the cli instead of hardcoding
    our own using the PG* variables.

commit 88e0b42805011bc3886f77ce5c91b3450351a16f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jan 19 16:32:27 2021 +0100

    simulator: Add documentation.

commit 62c6d90867bccb17ae076e1b5ee4db6fd350ad1b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jan 19 16:17:24 2021 +0100

    simulator: Make min_batch_size a parameter defined in the setup.

commit 9468bb9384f14e5fa0548b7d985f66fb3e36c85a
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Mon Jan 18 13:51:35 2021 +0100

    simulator: add basic tests for fill_test_data and run

commit ead7b347db9d8852b4c347729d7e6d32b72d9058
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 16:33:43 2021 +0100

    simulator: implement a simulator for the "old" task-based scheduler
    
    We extend the Task object with an autogenerated uuid allowing us to
    track the task lifetime between its creation and the generation of visit
    statuses, as the task-based scheduler does.

commit aecd27eee06aaa46d350e9d5b3f86ccc36a5446c
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 16:31:42 2021 +0100

    Move the simulator cli to the main cli module

commit 05067e3ecc888271507505112b48ebc9f755f5e7
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 15:37:59 2021 +0100

    simulator: Replace attrs with dataclasses for consistency

commit 24922fe2d995ca3ffa6c3c5a19c1f5f5531db4c8
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 15:31:41 2021 +0100

    simulator: wrap tasks and task events in typechecked objects
    
    This allows us to extend these objects without redefining a bunch of
    type annotations.

commit d5318aea0a93a94c80f8d743ce1de63592161f5a
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 14:47:33 2021 +0100

    simulator: also fill data for the task-based scheduler

commit 22ebb7a9a4bc6639e6f52d71c2b727537baf5019
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jan 15 14:41:05 2021 +0100

    simulator: Split into smaller files in the same package

commit ad7bfbe731da64cc6d1ddaa3f5ae1ef1e3350f60
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:50:00 2021 +0100

    simulator: Make the run time a CLI argument

commit df34db0bfc61df418f00338345b4b46a86340f62
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:40:16 2021 +0100

    simulator: tweak simulation environment constants

commit 21ce2c88dddce081bfd525d08454ca09bbf521c6
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:37:00 2021 +0100

    simulator: generate more origins in fill_data

commit 29204199774b40bea4d3d23ffe9407a5d090f8fa
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:35:01 2021 +0100

    simulator: add typing for Environment.scheduler

commit 6433266106dda007d1e5304a0dcb01706c8acb42
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:00:21 2021 +0100

    simulator: add support for a basic SimulationReport
    
    For now, this collects the runtime of tasks that have run, and gets
    printed at the end of the simulation.

commit c474a825336a4e4132e83982e180451b02d8f54d
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 11:45:23 2021 +0100

    simulator: refine origin model to follow an exponential distribution
    
    This models origins using a consistent characteristic "time between
    commits" that follows an exponential distribution between 1 second and
    10 years.
    
    From this characteristic time, and feedback from the OriginVisitStats,
    we can generate the expected run time and output status of the next
    visit of that origin.

commit 2459badf0c05bf2cb663e66b9deabf1150638bb1
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 11:43:20 2021 +0100

    simulator: Remove some debug statements and lower log level

commit cb12449e8f57e59ec4c7953a3c4a52c9193d202e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:17:11 2021 +0100

    simulator: simulate the scheduler journal client

commit 20b7f9c68f831839f4be1cae4b9ae2dce0fc2d96
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:12:38 2021 +0100

    simulator: generate OriginVisitStatus objects in modeled visits
    
    To be able to generate uneventful visits, we would need to store
    the last snapshot seen for a given origin. Instead of storing this
    within the simulator, which would be a concern for large scale
    simulations, we use the scheduler visit cache directly.

commit 39ad47de2e753033c4b7114a64b5c3144b6ea821
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:09:58 2021 +0100

    simulator: Move scheduler into the simulation environment object
    
    The scheduler is used by a lot of the simulated actors, it makes sense
    to share it all the time.

commit 31967fa850c3afe29fc37e41cfcd53ff5408e7b9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:07:56 2021 +0100

    simulator: Use datetimes instead of a floating point simulated time

commit fc3f06bd1d77c76bfba4c05efcd62abcb5c46eea
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Wed Jan 13 16:13:01 2021 +0100

    Introduce scaffolding for a scheduler simulator
    
    This simulator will allow us to compare the behavior of the old and new
    schedulers, as well as to test the impact of scheduler policies and their
    parameters on the performance of the Software Heritage archival
    infrastructure as a whole.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/194/ for more details.

This revision was automatically updated to reflect the committed changes.