Page MenuHomeSoftware Heritage

[wip] add materialized view origins_to_schedule and use it in grab_next_visits.
Changes PlannedPublic

Authored by vlorentz on Jan 22 2021, 4:22 PM.

Details

Reviewers
None
Group Reviewers
Reviewers

Event Timeline

Build is green

Patch application report for D4928 (id=17538)

Could not rebase; Attempt merge onto 86b255544c...

Updating 86b2555..8988481
Fast-forward
 swh/scheduler/backend.py                    | 133 +++++++++++++++++------
 swh/scheduler/cli/simulator.py              |   1 -
 swh/scheduler/interface.py                  |  15 ++-
 swh/scheduler/simulator/__init__.py         |  18 ++--
 swh/scheduler/simulator/common.py           |  41 +++++--
 swh/scheduler/simulator/origin_scheduler.py |   2 +-
 swh/scheduler/simulator/origins.py          | 162 +++++++++++++++++++++-------
 swh/scheduler/sql/30-schema.sql             |  28 +++++
 swh/scheduler/sql/60-indexes.sql            |   4 +
 swh/scheduler/tests/test_scheduler.py       |  21 +++-
 swh/scheduler/tests/test_simulator.py       |   9 +-
 11 files changed, 339 insertions(+), 95 deletions(-)
Changes applied before test
commit 8988481d2a95f9697afd91ab45e6a755476c6989
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jan 22 16:10:46 2021 +0100

    [wip] add indexes to origins_to_schedule.

commit bbd42e7b430badd008371908460c73d12dc42efa
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jan 22 15:54:15 2021 +0100

    [wip] add materialized view origins_to_schedule and use it in grab_next_visits.

commit b71fd526a9012545b8a92412bc500c08b0dc8372
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:55:43 2021 +0100

    simulator: stop validating the scheduling policy in the CLI
    
    We already do that in the scheduler backend function

commit 174b8ebba99dd696c4643a6f23cf208303bb0ff7
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:55:16 2021 +0100

    Run simulator tests on all known scheduling policies

commit 04cbecd89e941610b433c1378f24eb86cb1f04a7
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:48:38 2021 +0100

    simulator: record visit metrics alongside scheduler metrics
    
    This allows us to check the behavior of the archive over time in terms
    of number of visits.

commit 417e2874f930a898d79659d876ed978ab6fdd57f
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:45:23 2021 +0100

    simulator: stop using the database as a cache for origin data
    
    This was a significant bottleneck of the simulator. To work around this,
    we:
    
     - Generate snapshot ids consistently in the OriginModel
     - Cache the origin data locally in the simulator, to compute the
       eventfulness of visits
     - Cache the last visit time for all origins to compute the estimated
       run time of visit tasks.

commit 79b37ac6bec2e1c276b8e48b6f78821b515113c4
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:31:43 2021 +0100

    grab_next_visits: don't re-schedule visits too fast
    
    The earlier implementation would just schedule new visits for origins
    forever, regardless of whether they were already scheduled or not.

commit c4d02d51c1be808a50b10e9e77d3e28d82b7bb48
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:29:45 2021 +0100

    Allow overriding the timestamp of grab_next_visits
    
    This makes the simulator behavior more consistent with reality.

commit b1247caaeadabb13bc1502ab2e50247a62b2404e
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:27:40 2021 +0100

    Construct grab_next_visits query arguments incrementally

commit 0cb88aff9d42d297e2c272cfaecfc4a7c8460b75
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 21 14:57:42 2021 +0100

    simulator: add simple lister simulation

commit bf0daa6a45764e6634b6b7b10a3eec2d937640cc
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 21 14:54:53 2021 +0100

    Factor out ListedOrigin generation to use the OriginModel
    
    This generates consistent last_update values according to the model and
    simulated time.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/269/ for more details.

Build is green

Patch application report for D4928 (id=17575)

Could not rebase; Attempt merge onto 2906b4e8a0...

Updating 2906b4e..e9a00dc
Fast-forward
 swh/scheduler/backend.py                    | 133 ++++++++++++++++------
 swh/scheduler/cli/simulator.py              |   1 -
 swh/scheduler/interface.py                  |  15 ++-
 swh/scheduler/simulator/__init__.py         |  18 ++-
 swh/scheduler/simulator/common.py           |  41 +++++--
 swh/scheduler/simulator/origin_scheduler.py |   2 +-
 swh/scheduler/simulator/origins.py          | 171 +++++++++++++++++++++-------
 swh/scheduler/sql/30-schema.sql             |  28 +++++
 swh/scheduler/sql/60-indexes.sql            |   4 +
 swh/scheduler/tests/test_scheduler.py       |  21 +++-
 swh/scheduler/tests/test_simulator.py       |   9 +-
 11 files changed, 347 insertions(+), 96 deletions(-)
Changes applied before test
commit e9a00dc6dc661a0e991a0b3c7f08bfd915190d99
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jan 22 16:10:46 2021 +0100

    [wip] add indexes to origins_to_schedule.

commit 203cd2d23bc78b0af4f18310d989993c9a973966
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jan 22 15:54:15 2021 +0100

    [wip] add materialized view origins_to_schedule and use it in grab_next_visits.

commit 32c8ec91bc6dedd528ae7c8e828a419fddd9e6e0
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:55:43 2021 +0100

    simulator: stop validating the scheduling policy in the CLI
    
    We already do that in the scheduler backend function

commit 6d588a2df1b70c46dbd7828f9d8f478fed122915
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:55:16 2021 +0100

    Run simulator tests on all known scheduling policies

commit 1e7f9d7f79b2b135f182291b94bbe64ccb6e0595
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:48:38 2021 +0100

    simulator: record visit metrics alongside scheduler metrics
    
    This allows us to check the behavior of the archive over time in terms
    of number of visits.

commit 31e37e80927995902c6a3550166f7b2e3336b71c
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:45:23 2021 +0100

    simulator: stop using the database as a cache for origin data
    
    This was a significant bottleneck of the simulator. To work around this,
    we:
    
     - Generate snapshot ids consistently in the OriginModel
     - Cache the origin data locally in the simulator, to compute the
       eventfulness of visits
     - Cache the last visit time for all origins to compute the estimated
       run time of visit tasks.

commit 6784a19cdc38b8f97aa4f9c1da9859ece24865f1
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:31:43 2021 +0100

    grab_next_visits: don't re-schedule visits too fast
    
    The earlier implementation would just schedule new visits for origins
    forever, regardless of whether they were already scheduled or not.

commit 09a8768c30dc335afccde4df046b371a274cb2f9
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:29:45 2021 +0100

    Allow overriding the timestamp of grab_next_visits
    
    This makes the simulator behavior more consistent with reality.

commit a2dc72474056c2f20e255acf13ec3e662e1aad7a
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:27:40 2021 +0100

    Construct grab_next_visits query arguments incrementally

commit e5709214b4917a5fe3634d040da7a061f5978f66
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 21 14:57:42 2021 +0100

    simulator: add simple lister simulation

commit 7af98e2bc048c6946679e7d95cf8620e4a0ee4bf
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 21 14:54:53 2021 +0100

    Factor out ListedOrigin generation to use the OriginModel
    
    This generates consistent last_update values according to the model and
    simulated time.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/279/ for more details.

Build is green

Patch application report for D4928 (id=17615)

Could not rebase; Attempt merge onto 2906b4e8a0...

Updating 2906b4e..73f6ccb
Fast-forward
 docs/simulator.rst                          |  15 +++
 swh/scheduler/backend.py                    | 133 +++++++++++++++------
 swh/scheduler/cli/simulator.py              |   1 -
 swh/scheduler/interface.py                  |  15 ++-
 swh/scheduler/simulator/__init__.py         |  18 ++-
 swh/scheduler/simulator/common.py           |  41 +++++--
 swh/scheduler/simulator/origin_scheduler.py |   2 +-
 swh/scheduler/simulator/origins.py          | 173 ++++++++++++++++++++++------
 swh/scheduler/sql/30-schema.sql             |  28 +++++
 swh/scheduler/sql/60-indexes.sql            |   4 +
 swh/scheduler/tests/test_scheduler.py       |  21 +++-
 swh/scheduler/tests/test_simulator.py       |   9 +-
 12 files changed, 364 insertions(+), 96 deletions(-)
Changes applied before test
commit 73f6ccb5a6bdb061df6c7f832ce27b954eb61828
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jan 22 16:10:46 2021 +0100

    [wip] add indexes to origins_to_schedule.

commit ce0e5d58538c854b747d89bf64d119706713ec5d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jan 22 15:54:15 2021 +0100

    [wip] add materialized view origins_to_schedule and use it in grab_next_visits.

commit cf0583b079594c85e5e4fb512aceaf9fd4151473
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:55:43 2021 +0100

    simulator: stop validating the scheduling policy in the CLI
    
    We already do that in the scheduler backend function

commit ebb5847ea2eec79fa9b89cd684f1b6a92059324d
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:55:16 2021 +0100

    Run simulator tests on all known scheduling policies

commit 1f77521d486cfa110983b85fe0a724a347291840
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:48:38 2021 +0100

    simulator: record visit metrics alongside scheduler metrics
    
    This allows us to check the behavior of the archive over time in terms
    of number of visits.

commit 889839446eb8645a5520237513a54c892d3a3104
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:45:23 2021 +0100

    simulator: stop using the database as a cache for origin data
    
    This was a significant bottleneck of the simulator. To work around this,
    we:
    
     - Generate snapshot ids consistently in the OriginModel
     - Cache the origin data locally in the simulator, to compute the
       eventfulness of visits
     - Cache the last visit time for all origins to compute the estimated
       run time of visit tasks.

commit c92ead5875ecfd96a164eec1803398adec6eb8a8
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:31:43 2021 +0100

    grab_next_visits: don't re-schedule visits too fast
    
    The earlier implementation would just schedule new visits for origins
    forever, regardless of whether they were already scheduled or not.

commit 2b39cbcabf9960c1f660442e15f6c17654aec9e2
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:29:45 2021 +0100

    Allow overriding the timestamp of grab_next_visits
    
    This makes the simulator behavior more consistent with reality.

commit 7ffbdd1b3eb579f43e8913ea11cfd916b2f3c457
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:27:40 2021 +0100

    Construct grab_next_visits query arguments incrementally

commit ea068b46a89e07c60ad1233afd36afc6bb29031e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 21 14:57:42 2021 +0100

    simulator: add simple lister simulation

commit 7af98e2bc048c6946679e7d95cf8620e4a0ee4bf
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 21 14:54:53 2021 +0100

    Factor out ListedOrigin generation to use the OriginModel
    
    This generates consistent last_update values according to the model and
    simulated time.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/288/ for more details.