We already do that in the scheduler backend function
Details
Details
- Reviewers
olasd ardumont - Group Reviewers
Reviewers - Commits
- rDSCHcf0583b07959: simulator: stop validating the scheduling policy in the CLI
Diff Detail
Diff Detail
- Repository
- rDSCH Scheduling utilities
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 18660 Build 28875: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 28874: arc lint + arc unit
Event Timeline
Comment Actions
Build is green
Patch application report for D4917 (id=17496)
Could not rebase; Attempt merge onto 03460207a1...
Updating 0346020..5baba8a Fast-forward swh/scheduler/backend.py | 107 ++++++++++++++---- swh/scheduler/cli/simulator.py | 1 - swh/scheduler/interface.py | 45 ++++++-- swh/scheduler/model.py | 33 +----- swh/scheduler/simulator/__init__.py | 25 +++-- swh/scheduler/simulator/common.py | 41 +++++-- swh/scheduler/simulator/origin_scheduler.py | 2 +- swh/scheduler/simulator/origins.py | 162 +++++++++++++++++++++------- swh/scheduler/tests/test_scheduler.py | 93 +++++++++++++--- swh/scheduler/tests/test_simulator.py | 15 ++- 10 files changed, 387 insertions(+), 137 deletions(-)
Changes applied before test
commit 5baba8a19b98a5d2559fe7a1aa6a66f231c14b65 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Thu Jan 21 17:55:43 2021 +0100 simulator: stop validating the scheduling policy in the CLI We already do that in the scheduler backend function commit f878c6036ba7400dc08fc33dc8d3858cc234b4c9 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Thu Jan 21 17:55:16 2021 +0100 Run simulator tests on all known scheduling policies commit bdbc3a86f84772ec166764ca5169ec597cf89e14 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Thu Jan 21 17:48:38 2021 +0100 simulator: record visit metrics alongside scheduler metrics This allows us to check the behavior of the archive over time in terms of number of visits. commit 7afb0a498432d1e2641abf3a9de859354699c5c4 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Thu Jan 21 17:45:23 2021 +0100 simulator: stop using the database as a cache for origin data This was a significant bottleneck of the simulator. To work around this, we: - Generate snapshot ids consistently in the OriginModel - Cache the origin data locally in the simulator, to compute the eventfulness of visits - Cache the last visit time for all origins to compute the estimated run time of visit tasks. commit 8e7377d8af45ef8e8234b57dc6a16be75dd74ac5 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 21 17:38:41 2021 +0100 simulator: add a trivial heartbeat process to show progress For now, this process only writes a log every simulated day. commit ba303f946ecd3e15e58de0072ce71b50aa423d59 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Thu Jan 21 17:31:43 2021 +0100 grab_next_visits: don't re-schedule visits too fast The earlier implementation would just schedule new visits for origins forever, regardless of whether they were already scheduled or not. commit 808ae6851faee9b633e773f9150d360cdb927146 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Thu Jan 21 17:29:45 2021 +0100 Allow overriding the timestamp of grab_next_visits This makes the simulator behavior more consistent with reality. commit 9943195d31c51a44325cba09d07fb6e904d45a00 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Thu Jan 21 17:27:40 2021 +0100 Construct grab_next_visits query arguments incrementally commit 72070b7bf628788b6872e90a3f8ac8f0c01b70d9 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 21 14:57:42 2021 +0100 simulator: add simple lister simulation commit 1f1aad459c4b0740ecbe96e9809e4b31f66bf999 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 21 14:54:53 2021 +0100 Factor out ListedOrigin generation to use the OriginModel This generates consistent last_update values according to the model and simulated time. commit b93aa5be2c2d5dc2130e1027698f3e1255052d8d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 21 13:01:53 2021 +0100 Make PaginatedListedOriginList a concretization of PagedResult 1. consistent with swh-storage and swh-indexer-storage 2. we can use swh.core.api.classes.stream_results on scheduler.get_listed_origins. commit 2f47936731cf438a5195978a2af3250597b693b5 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 20 17:29:16 2021 +0100 Add scheduling policy for already visited origins with known last update This policy schedules origins by decreasing order of "visit lag" (that is, origins with the most lag are scheduled first). commit acad712ad3f71f88f99e45e9b4f571ad751945dc Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Jan 20 17:25:46 2021 +0100 Add scheduling policy for never visited origins This policy orders never visited origins by increasing date of last update (scheduling the "oldest" never visited origins first).
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/239/ for more details.
Comment Actions
Build is green
Patch application report for D4917 (id=17528)
Could not rebase; Attempt merge onto b93aa5be2c...
Updating b93aa5b..d7f50be Fast-forward swh/scheduler/backend.py | 61 +++++++++-- swh/scheduler/cli/simulator.py | 1 - swh/scheduler/interface.py | 15 ++- swh/scheduler/simulator/__init__.py | 18 ++-- swh/scheduler/simulator/common.py | 41 +++++-- swh/scheduler/simulator/origin_scheduler.py | 2 +- swh/scheduler/simulator/origins.py | 162 +++++++++++++++++++++------- swh/scheduler/tests/test_scheduler.py | 21 +++- swh/scheduler/tests/test_simulator.py | 9 +- 9 files changed, 257 insertions(+), 73 deletions(-)
Changes applied before test
commit d7f50bea93b746defa6ae66fa339fcc47a9b5a9a Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Thu Jan 21 17:55:43 2021 +0100 simulator: stop validating the scheduling policy in the CLI We already do that in the scheduler backend function commit 1562717043553c47b38e0c3f8252aea4b02c8ed8 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Thu Jan 21 17:55:16 2021 +0100 Run simulator tests on all known scheduling policies commit b3731d5d64ebce878eb9c6d5a29b2c6951aceb04 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Thu Jan 21 17:48:38 2021 +0100 simulator: record visit metrics alongside scheduler metrics This allows us to check the behavior of the archive over time in terms of number of visits. commit 946c5c2594aa4ae6bfd9406e7ac0cb6fa6b7e199 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Thu Jan 21 17:45:23 2021 +0100 simulator: stop using the database as a cache for origin data This was a significant bottleneck of the simulator. To work around this, we: - Generate snapshot ids consistently in the OriginModel - Cache the origin data locally in the simulator, to compute the eventfulness of visits - Cache the last visit time for all origins to compute the estimated run time of visit tasks. commit f5af170b5f8d6bc36b56aa007db613bcc5754804 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Thu Jan 21 17:31:43 2021 +0100 grab_next_visits: don't re-schedule visits too fast The earlier implementation would just schedule new visits for origins forever, regardless of whether they were already scheduled or not. commit 808ae6851faee9b633e773f9150d360cdb927146 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Thu Jan 21 17:29:45 2021 +0100 Allow overriding the timestamp of grab_next_visits This makes the simulator behavior more consistent with reality. commit 9943195d31c51a44325cba09d07fb6e904d45a00 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Thu Jan 21 17:27:40 2021 +0100 Construct grab_next_visits query arguments incrementally commit 72070b7bf628788b6872e90a3f8ac8f0c01b70d9 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 21 14:57:42 2021 +0100 simulator: add simple lister simulation commit 1f1aad459c4b0740ecbe96e9809e4b31f66bf999 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 21 14:54:53 2021 +0100 Factor out ListedOrigin generation to use the OriginModel This generates consistent last_update values according to the model and simulated time.
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/264/ for more details.
Comment Actions
Build is green
Patch application report for D4917 (id=17574)
Could not rebase; Attempt merge onto 2906b4e8a0...
Updating 2906b4e..32c8ec9 Fast-forward swh/scheduler/backend.py | 61 ++++++++-- swh/scheduler/cli/simulator.py | 1 - swh/scheduler/interface.py | 15 ++- swh/scheduler/simulator/__init__.py | 18 ++- swh/scheduler/simulator/common.py | 41 +++++-- swh/scheduler/simulator/origin_scheduler.py | 2 +- swh/scheduler/simulator/origins.py | 171 +++++++++++++++++++++------- swh/scheduler/tests/test_scheduler.py | 21 +++- swh/scheduler/tests/test_simulator.py | 9 +- 9 files changed, 265 insertions(+), 74 deletions(-)
Changes applied before test
commit 32c8ec91bc6dedd528ae7c8e828a419fddd9e6e0 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Thu Jan 21 17:55:43 2021 +0100 simulator: stop validating the scheduling policy in the CLI We already do that in the scheduler backend function commit 6d588a2df1b70c46dbd7828f9d8f478fed122915 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Thu Jan 21 17:55:16 2021 +0100 Run simulator tests on all known scheduling policies commit 1e7f9d7f79b2b135f182291b94bbe64ccb6e0595 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Thu Jan 21 17:48:38 2021 +0100 simulator: record visit metrics alongside scheduler metrics This allows us to check the behavior of the archive over time in terms of number of visits. commit 31e37e80927995902c6a3550166f7b2e3336b71c Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Thu Jan 21 17:45:23 2021 +0100 simulator: stop using the database as a cache for origin data This was a significant bottleneck of the simulator. To work around this, we: - Generate snapshot ids consistently in the OriginModel - Cache the origin data locally in the simulator, to compute the eventfulness of visits - Cache the last visit time for all origins to compute the estimated run time of visit tasks. commit 6784a19cdc38b8f97aa4f9c1da9859ece24865f1 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Thu Jan 21 17:31:43 2021 +0100 grab_next_visits: don't re-schedule visits too fast The earlier implementation would just schedule new visits for origins forever, regardless of whether they were already scheduled or not. commit 09a8768c30dc335afccde4df046b371a274cb2f9 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Thu Jan 21 17:29:45 2021 +0100 Allow overriding the timestamp of grab_next_visits This makes the simulator behavior more consistent with reality. commit a2dc72474056c2f20e255acf13ec3e662e1aad7a Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Thu Jan 21 17:27:40 2021 +0100 Construct grab_next_visits query arguments incrementally commit e5709214b4917a5fe3634d040da7a061f5978f66 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 21 14:57:42 2021 +0100 simulator: add simple lister simulation commit 7af98e2bc048c6946679e7d95cf8620e4a0ee4bf Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 21 14:54:53 2021 +0100 Factor out ListedOrigin generation to use the OriginModel This generates consistent last_update values according to the model and simulated time.
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/278/ for more details.
Comment Actions
Build is green
Patch application report for D4917 (id=17614)
Could not rebase; Attempt merge onto 2906b4e8a0...
Updating 2906b4e..cf0583b Fast-forward docs/simulator.rst | 15 +++ swh/scheduler/backend.py | 61 ++++++++-- swh/scheduler/cli/simulator.py | 1 - swh/scheduler/interface.py | 15 ++- swh/scheduler/simulator/__init__.py | 18 ++- swh/scheduler/simulator/common.py | 41 +++++-- swh/scheduler/simulator/origin_scheduler.py | 2 +- swh/scheduler/simulator/origins.py | 173 ++++++++++++++++++++++------ swh/scheduler/tests/test_scheduler.py | 21 +++- swh/scheduler/tests/test_simulator.py | 9 +- 10 files changed, 282 insertions(+), 74 deletions(-)
Changes applied before test
commit cf0583b079594c85e5e4fb512aceaf9fd4151473 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Thu Jan 21 17:55:43 2021 +0100 simulator: stop validating the scheduling policy in the CLI We already do that in the scheduler backend function commit ebb5847ea2eec79fa9b89cd684f1b6a92059324d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Thu Jan 21 17:55:16 2021 +0100 Run simulator tests on all known scheduling policies commit 1f77521d486cfa110983b85fe0a724a347291840 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Thu Jan 21 17:48:38 2021 +0100 simulator: record visit metrics alongside scheduler metrics This allows us to check the behavior of the archive over time in terms of number of visits. commit 889839446eb8645a5520237513a54c892d3a3104 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Thu Jan 21 17:45:23 2021 +0100 simulator: stop using the database as a cache for origin data This was a significant bottleneck of the simulator. To work around this, we: - Generate snapshot ids consistently in the OriginModel - Cache the origin data locally in the simulator, to compute the eventfulness of visits - Cache the last visit time for all origins to compute the estimated run time of visit tasks. commit c92ead5875ecfd96a164eec1803398adec6eb8a8 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Thu Jan 21 17:31:43 2021 +0100 grab_next_visits: don't re-schedule visits too fast The earlier implementation would just schedule new visits for origins forever, regardless of whether they were already scheduled or not. commit 2b39cbcabf9960c1f660442e15f6c17654aec9e2 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Thu Jan 21 17:29:45 2021 +0100 Allow overriding the timestamp of grab_next_visits This makes the simulator behavior more consistent with reality. commit 7ffbdd1b3eb579f43e8913ea11cfd916b2f3c457 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Thu Jan 21 17:27:40 2021 +0100 Construct grab_next_visits query arguments incrementally commit ea068b46a89e07c60ad1233afd36afc6bb29031e Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 21 14:57:42 2021 +0100 simulator: add simple lister simulation commit 7af98e2bc048c6946679e7d95cf8620e4a0ee4bf Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jan 21 14:54:53 2021 +0100 Factor out ListedOrigin generation to use the OriginModel This generates consistent last_update values according to the model and simulated time.
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/287/ for more details.