Page MenuHomeSoftware Heritage

Add a cli for the scheduler metrics update endpoint
ClosedPublic

Authored by vlorentz on Jan 19 2021, 6:43 PM.

Details

Summary

Depends on D4887

Test Plan

new test added

Diff Detail

Repository
rDSCH Scheduling utilities
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 18542
Build 28685: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 28684: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D4889 (id=17365)

Could not rebase; Attempt merge onto 0a32a31195...

Auto-merging setup.py
Merge made by the 'recursive' strategy.
 .pre-commit-config.yaml                     |   1 +
 docs/index.rst                              |   1 +
 docs/simulator.rst                          |  65 ++++++++++
 mypy.ini                                    |   6 +
 requirements-simulator.txt                  |   2 +
 setup.py                                    |  34 +++---
 sql/updates/24.sql                          |  64 ++++++++++
 swh/scheduler/backend.py                    |  87 +++++++++++++
 swh/scheduler/cli/__init__.py               |   2 +-
 swh/scheduler/cli/origin.py                 |  40 ++++++
 swh/scheduler/cli/simulator.py              |  68 +++++++++++
 swh/scheduler/interface.py                  |  40 ++++++
 swh/scheduler/model.py                      |  32 +++++
 swh/scheduler/simulator/__init__.py         | 147 ++++++++++++++++++++++
 swh/scheduler/simulator/common.py           | 102 ++++++++++++++++
 swh/scheduler/simulator/origin_scheduler.py |  68 +++++++++++
 swh/scheduler/simulator/origins.py          | 128 ++++++++++++++++++++
 swh/scheduler/simulator/task_scheduler.py   |  76 ++++++++++++
 swh/scheduler/sql/30-schema.sql             |  24 +++-
 swh/scheduler/sql/40-func.sql               |  40 ++++++
 swh/scheduler/tests/test_api_client.py      |   3 +
 swh/scheduler/tests/test_cli_origin.py      |  11 ++
 swh/scheduler/tests/test_scheduler.py       | 181 +++++++++++++++++++++++++++-
 swh/scheduler/tests/test_simulator.py       |  53 ++++++++
 24 files changed, 1255 insertions(+), 20 deletions(-)
 create mode 100644 docs/simulator.rst
 create mode 100644 requirements-simulator.txt
 create mode 100644 sql/updates/24.sql
 create mode 100644 swh/scheduler/cli/simulator.py
 create mode 100644 swh/scheduler/simulator/__init__.py
 create mode 100644 swh/scheduler/simulator/common.py
 create mode 100644 swh/scheduler/simulator/origin_scheduler.py
 create mode 100644 swh/scheduler/simulator/origins.py
 create mode 100644 swh/scheduler/simulator/task_scheduler.py
 create mode 100644 swh/scheduler/tests/test_simulator.py
Changes applied before test
commit 5c78e45bcb58cc1483988fb0de43de46557134d2
Merge: 0a32a31 9efb4e8
Author: Jenkins user <jenkins@localhost>
Date:   Tue Jan 19 17:49:39 2021 +0000

    Merge branch 'diff-target' into HEAD

commit 9efb4e8d578ce047bc4dcc8424fb8b06e27fae19
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jan 19 18:39:21 2021 +0100

    Add a cli for the scheduler metrics update endpoint

commit 191ec9d9874c335b7ce10958766b450d054a74a4
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Tue Jan 19 17:48:31 2021 +0100

    Introduce a new lister_get endpoint

commit 2bf1109c21ac77b7f03ad993f4cf0650390b929b
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Tue Jan 19 14:23:32 2021 +0100

    Implement some basic aggregated metrics on listed origins
    
    Metrics are computed and cached database-side by the `update_metrics`
    function. The `get_metrics` function only retrieves the cached data.

commit e12a4f13386cdb25d366f5e2ee81044cb8e30169
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jan 19 18:36:53 2021 +0100

    simulator: stop using get_scheduler directly
    
    This reuses the scheduler instantiated by the cli instead of hardcoding
    our own using the PG* variables.

commit ed044415e625080cb4bc67b2656743d92ed4c884
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jan 19 16:32:27 2021 +0100

    simulator: Add documentation.

commit 186aebeb12905dc98cc370b360a8b3f5c4db3186
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jan 19 16:17:24 2021 +0100

    simulator: Make min_batch_size a parameter defined in the setup.

commit 6150c764616d3c25ee13eb08bea6b4c9d1c2bc0d
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Mon Jan 18 13:51:35 2021 +0100

    simulator: add basic tests for fill_test_data and run

commit 5bee207dca74bb2c70611b3308c93bc522d48247
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 16:33:43 2021 +0100

    simulator: implement a simulator for the "old" task-based scheduler
    
    We extend the Task object with an autogenerated uuid allowing us to
    track the task lifetime between its creation and the generation of visit
    statuses, as the task-based scheduler does.

commit 6ec79c18b7b0e8be7b086aa79e87de81a8dbd06a
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 16:31:42 2021 +0100

    Move the simulator cli to the main cli module

commit b25874a7066c95460f7d24c132f32f4dabf055a7
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 15:37:59 2021 +0100

    simulator: Replace attrs with dataclasses for consistency

commit cb0bc27be55cf384c68b834ae3c89dd93434fbba
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 15:31:41 2021 +0100

    simulator: wrap tasks and task events in typechecked objects
    
    This allows us to extend these objects without redefining a bunch of
    type annotations.

commit 947aecb14cdb4c6dd2da178f53599b4a41c8245b
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 14:47:33 2021 +0100

    simulator: also fill data for the task-based scheduler

commit ff6cd0669e0d75afbd2c63424db66bf8d1e91bee
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jan 15 14:41:05 2021 +0100

    simulator: Split into smaller files in the same package

commit b232135cb982f4fc8e5fb6242a88012d732e252d
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:50:00 2021 +0100

    simulator: Make the run time a CLI argument

commit 24e93d8aa72107bf953f884df4c9b15ea9cbeb2c
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:40:16 2021 +0100

    simulator: tweak simulation environment constants

commit 9885d12cd708a26878cd9aa70ab590223589e8d7
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:37:00 2021 +0100

    simulator: generate more origins in fill_data

commit a1d80fec0f5760d136857fb893232b1baec35b64
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:35:01 2021 +0100

    simulator: add typing for Environment.scheduler

commit 6a9ec5f38133fe232da1ca98ff30ef44b12a4c12
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 12:00:21 2021 +0100

    simulator: add support for a basic SimulationReport
    
    For now, this collects the runtime of tasks that have run, and gets
    printed at the end of the simulation.

commit 5d0e2aee4182df9476934349ad20da5dafc8b61f
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 11:45:23 2021 +0100

    simulator: refine origin model to follow an exponential distribution
    
    This models origins using a consistent characteristic "time between
    commits" that follows an exponential distribution between 1 second and
    10 years.
    
    From this characteristic time, and feedback from the OriginVisitStats,
    we can generate the expected run time and output status of the next
    visit of that origin.

commit 7934c2f90191615db69b50dc27744ec73704f896
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Jan 15 11:43:20 2021 +0100

    simulator: Remove some debug statements and lower log level

commit d0ed751eca9f2ff0464b795edb9e9bb2a0305649
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:17:11 2021 +0100

    simulator: simulate the scheduler journal client

commit c9b0728955e683748f8b03a22f91d501b64aad67
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:12:38 2021 +0100

    simulator: generate OriginVisitStatus objects in modeled visits
    
    To be able to generate uneventful visits, we would need to store
    the last snapshot seen for a given origin. Instead of storing this
    within the simulator, which would be a concern for large scale
    simulations, we use the scheduler visit cache directly.

commit 56c8d1dd66d8a993c8bc7c7bcc4e3fb3704f6864
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:09:58 2021 +0100

    simulator: Move scheduler into the simulation environment object
    
    The scheduler is used by a lot of the simulated actors, it makes sense
    to share it all the time.

commit 1659aa17fe0510030fb24d3b7867d2c4a366b5dd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 14 15:07:56 2021 +0100

    simulator: Use datetimes instead of a floating point simulated time

commit ef241dd84c400f9be0d92396867587d47216e385
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Wed Jan 13 16:13:01 2021 +0100

    Introduce scaffolding for a scheduler simulator
    
    This simulator will allow us to compare the behavior of the old and new
    schedulers, as well as to test the impact of scheduler policies and their
    parameters on the performance of the Software Heritage archival
    infrastructure as a whole.

commit 49a14792b0329049b51cbc6ed9c48006e9ff1a73
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Tue Jan 19 17:56:44 2021 +0100

    Import the journal subcommand in the main swh.scheduler cli
    
    This issue was masked by tox.ini using pytest with --doctest-modules,
    which imports all modules during test collection, and therefore executing
    the side-effects of swh.scheduler.cli.journal.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/174/ for more details.

Build is green

Patch application report for D4889 (id=17380)

Could not rebase; Attempt merge onto 98526539a8...

Updating 9852653..53b034c
Fast-forward
 sql/updates/25.sql                     |  64 ++++++++++++
 swh/scheduler/backend.py               |  87 ++++++++++++++++
 swh/scheduler/cli/origin.py            |  40 ++++++++
 swh/scheduler/interface.py             |  40 ++++++++
 swh/scheduler/model.py                 |  32 ++++++
 swh/scheduler/sql/30-schema.sql        |  24 ++++-
 swh/scheduler/sql/40-func.sql          |  40 ++++++++
 swh/scheduler/tests/test_api_client.py |   3 +
 swh/scheduler/tests/test_cli_origin.py |  11 ++
 swh/scheduler/tests/test_scheduler.py  | 181 ++++++++++++++++++++++++++++++++-
 10 files changed, 519 insertions(+), 3 deletions(-)
 create mode 100644 sql/updates/25.sql
Changes applied before test
commit 53b034cb8d09efa0c9b448d29fb70d727bc6a066
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jan 19 18:39:21 2021 +0100

    Add a cli for the scheduler metrics update endpoint

commit 737d12e5b9e694b22bef291c625090fb3aee2afc
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Tue Jan 19 17:48:31 2021 +0100

    Introduce a new lister_get endpoint

commit 114ed952e513c7ad3dbb038a640e80bf079d0780
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Tue Jan 19 14:23:32 2021 +0100

    Implement some basic aggregated metrics on listed origins
    
    Metrics are computed and cached database-side by the `update_metrics`
    function. The `get_metrics` function only retrieves the cached data.
    
    The metrics are aggregated for each lister instance and visit type
    (allowing complete reaggregation by visit type for cross-cutting statistics).
    
    The following metrics have been implemented:
     - number of known origins overall
     - number of enabled origins (origins seen in the last listing)
     - number of enabled origins that have never been successfully visited
     - number of enabled origins with known activity since our last successful visit

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/180/ for more details.

This revision is now accepted and ready to land.Jan 20 2021, 12:45 PM

Build is green

Patch application report for D4889 (id=17405)

Rebasing onto c386fdf3b9...

Current branch diff-target is up to date.
Changes applied before test
commit 7905a6bea4ba46662d93fcda05c943c8b01408c3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jan 19 18:39:21 2021 +0100

    Add a cli for the scheduler metrics update endpoint

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/193/ for more details.