Page MenuHomeSoftware Heritage

Simulation: allow to export results in a csv file
ClosedPublic

Authored by douardda on Jan 22 2021, 10:51 AM.

Details

Diff Detail

Event Timeline

Build is green

Patch application report for D4923 (id=17508)

Could not rebase; Attempt merge onto 03460207a1...

Updating 0346020..708b1f7
Fast-forward
 swh/scheduler/cli/simulator.py      | 25 ++++++++++---
 swh/scheduler/simulator/__init__.py | 61 ++++++++++++++++++-------------
 swh/scheduler/simulator/common.py   | 71 +++++++++++++++++++++++++++----------
 3 files changed, 110 insertions(+), 47 deletions(-)
Changes applied before test
commit 708b1f7a0d098fe0b78a6479998c025930264e01
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 22 10:49:13 2021 +0100

    Simulation: allow to export results in a csv file

commit 6ef3d2177ca00e5f42885eaf6a37ecd0c94df7ac
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 22 10:47:18 2021 +0100

    Simulation: log at infol level recorded metrics
    
    this allows to follows what the simulation is doing.

commit 6698e51903bfdfcc00cfc55be3056c25c8cb270f
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 22 09:07:22 2021 +0100

    Make plotting histograms optional in simulator cli command

commit 1c069ca34add6b26d060588abb7958a089cb0735
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jan 21 11:33:19 2021 +0100

    Randomize last_upadte in generated ListedOrigins in fill_test_data
    
    also insert objects by batches of 10k to make it nicer with ram usage.

commit 8a9aaf3942d5585e6af038ebced3cde6faf27c7e
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jan 21 11:30:21 2021 +0100

    Add a --num-origins option to the fill-test-data cli command

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/248/ for more details.

olasd added inline comments.
swh/scheduler/simulator/common.py
79

known

Build is green

Patch application report for D4923 (id=17519)

Could not rebase; Attempt merge onto b93aa5be2c...

Merge made by the 'recursive' strategy.
 swh/scheduler/cli/simulator.py      | 12 +++++--
 swh/scheduler/simulator/__init__.py |  2 +-
 swh/scheduler/simulator/common.py   | 71 +++++++++++++++++++++++++++----------
 3 files changed, 64 insertions(+), 21 deletions(-)
Changes applied before test
commit 38727d02937d62007c1c811817ecbb10ec88d582
Merge: b93aa5b d5d5c1d
Author: Jenkins user <jenkins@localhost>
Date:   Fri Jan 22 10:40:56 2021 +0000

    Merge branch 'diff-target' into HEAD

commit d5d5c1dca97032eda637c6946627e5ee8b80c6f6
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 22 10:49:13 2021 +0100

    Simulation: allow to export results in a csv file

commit 011cc3ddff7e5683f1f00a8c578d21776a923ae7
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 22 09:07:22 2021 +0100

    Make plotting histograms optional in simulator cli command

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/257/ for more details.

swh/scheduler/simulator/common.py
100

Instead of zipping twice, we can probably just do the sums in a loop for each timestamp.

Build is green

Patch application report for D4923 (id=17522)

Could not rebase; Attempt merge onto b93aa5be2c...

Updating b93aa5b..bd6def0
Fast-forward
 swh/scheduler/backend.py                    |  61 +++++++++--
 swh/scheduler/cli/simulator.py              |  17 ++-
 swh/scheduler/interface.py                  |  15 ++-
 swh/scheduler/simulator/__init__.py         |  27 ++---
 swh/scheduler/simulator/common.py           | 109 ++++++++++++++-----
 swh/scheduler/simulator/origin_scheduler.py |   2 +-
 swh/scheduler/simulator/origins.py          | 162 +++++++++++++++++++++-------
 swh/scheduler/tests/test_simulator.py       |   9 +-
 8 files changed, 311 insertions(+), 91 deletions(-)
Changes applied before test
commit bd6def07947375e8cc01e87e038f948a7b3ba425
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 22 12:17:00 2021 +0100

    Simulator: allow to export results in a csv file

commit bd0941c722dae7ac14385b45048eee7f9565f735
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 22 12:15:47 2021 +0100

    Make plottings optional in simulator cli output

commit f878c6036ba7400dc08fc33dc8d3858cc234b4c9
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:55:16 2021 +0100

    Run simulator tests on all known scheduling policies

commit bdbc3a86f84772ec166764ca5169ec597cf89e14
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:48:38 2021 +0100

    simulator: record visit metrics alongside scheduler metrics
    
    This allows us to check the behavior of the archive over time in terms
    of number of visits.

commit 7afb0a498432d1e2641abf3a9de859354699c5c4
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:45:23 2021 +0100

    simulator: stop using the database as a cache for origin data
    
    This was a significant bottleneck of the simulator. To work around this,
    we:
    
     - Generate snapshot ids consistently in the OriginModel
     - Cache the origin data locally in the simulator, to compute the
       eventfulness of visits
     - Cache the last visit time for all origins to compute the estimated
       run time of visit tasks.

commit 8e7377d8af45ef8e8234b57dc6a16be75dd74ac5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 21 17:38:41 2021 +0100

    simulator: add a trivial heartbeat process to show progress
    
    For now, this process only writes a log every simulated day.

commit ba303f946ecd3e15e58de0072ce71b50aa423d59
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:31:43 2021 +0100

    grab_next_visits: don't re-schedule visits too fast
    
    The earlier implementation would just schedule new visits for origins
    forever, regardless of whether they were already scheduled or not.

commit 808ae6851faee9b633e773f9150d360cdb927146
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:29:45 2021 +0100

    Allow overriding the timestamp of grab_next_visits
    
    This makes the simulator behavior more consistent with reality.

commit 9943195d31c51a44325cba09d07fb6e904d45a00
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:27:40 2021 +0100

    Construct grab_next_visits query arguments incrementally

commit 72070b7bf628788b6872e90a3f8ac8f0c01b70d9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 21 14:57:42 2021 +0100

    simulator: add simple lister simulation

commit 1f1aad459c4b0740ecbe96e9809e4b31f66bf999
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 21 14:54:53 2021 +0100

    Factor out ListedOrigin generation to use the OriginModel
    
    This generates consistent last_update values according to the model and
    simulated time.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/259/ for more details.

Build is green

Patch application report for D4923 (id=17536)

Could not rebase; Attempt merge onto 86b255544c...

Auto-merging swh/scheduler/simulator/__init__.py
Auto-merging swh/scheduler/cli/simulator.py
Merge made by the 'recursive' strategy.
 swh/scheduler/backend.py                    |  61 +++++++++--
 swh/scheduler/cli/simulator.py              |  17 ++-
 swh/scheduler/interface.py                  |  15 ++-
 swh/scheduler/simulator/__init__.py         |  27 ++---
 swh/scheduler/simulator/common.py           | 109 ++++++++++++++-----
 swh/scheduler/simulator/origin_scheduler.py |   2 +-
 swh/scheduler/simulator/origins.py          | 162 +++++++++++++++++++++-------
 swh/scheduler/tests/test_simulator.py       |   9 +-
 8 files changed, 311 insertions(+), 91 deletions(-)
Changes applied before test
commit 522a327ced00337a239057d1d464323c1531dc92
Merge: 86b2555 df8f308
Author: Jenkins user <jenkins@localhost>
Date:   Fri Jan 22 15:22:40 2021 +0000

    Merge branch 'diff-target' into HEAD

commit df8f3086db8ae4289b6a3a2c675308b25fa82165
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 22 12:17:00 2021 +0100

    Simulator: allow to export results in a csv file

commit 6a1b2e037498f5d1ad28effc5fb0a79f520ef46a
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 22 12:15:47 2021 +0100

    Make plottings optional in simulator cli output

commit f878c6036ba7400dc08fc33dc8d3858cc234b4c9
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:55:16 2021 +0100

    Run simulator tests on all known scheduling policies

commit bdbc3a86f84772ec166764ca5169ec597cf89e14
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:48:38 2021 +0100

    simulator: record visit metrics alongside scheduler metrics
    
    This allows us to check the behavior of the archive over time in terms
    of number of visits.

commit 7afb0a498432d1e2641abf3a9de859354699c5c4
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:45:23 2021 +0100

    simulator: stop using the database as a cache for origin data
    
    This was a significant bottleneck of the simulator. To work around this,
    we:
    
     - Generate snapshot ids consistently in the OriginModel
     - Cache the origin data locally in the simulator, to compute the
       eventfulness of visits
     - Cache the last visit time for all origins to compute the estimated
       run time of visit tasks.

commit 8e7377d8af45ef8e8234b57dc6a16be75dd74ac5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 21 17:38:41 2021 +0100

    simulator: add a trivial heartbeat process to show progress
    
    For now, this process only writes a log every simulated day.

commit ba303f946ecd3e15e58de0072ce71b50aa423d59
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:31:43 2021 +0100

    grab_next_visits: don't re-schedule visits too fast
    
    The earlier implementation would just schedule new visits for origins
    forever, regardless of whether they were already scheduled or not.

commit 808ae6851faee9b633e773f9150d360cdb927146
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:29:45 2021 +0100

    Allow overriding the timestamp of grab_next_visits
    
    This makes the simulator behavior more consistent with reality.

commit 9943195d31c51a44325cba09d07fb6e904d45a00
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jan 21 17:27:40 2021 +0100

    Construct grab_next_visits query arguments incrementally

commit 72070b7bf628788b6872e90a3f8ac8f0c01b70d9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 21 14:57:42 2021 +0100

    simulator: add simple lister simulation

commit 1f1aad459c4b0740ecbe96e9809e4b31f66bf999
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 21 14:54:53 2021 +0100

    Factor out ListedOrigin generation to use the OriginModel
    
    This generates consistent last_update values according to the model and
    simulated time.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/268/ for more details.

(requesting changes to get it out of my review queue)

This revision now requires changes to proceed.Jan 29 2021, 5:11 PM

Build is green

Patch application report for D4923 (id=17759)

Could not rebase; Attempt merge onto cf0583b079...

Updating cf0583b..0af7420
Fast-forward
 swh/scheduler/cli/simulator.py      | 17 +++++++--
 swh/scheduler/simulator/__init__.py |  2 +-
 swh/scheduler/simulator/common.py   | 70 +++++++++++++++++++++++++++----------
 3 files changed, 68 insertions(+), 21 deletions(-)
Changes applied before test
commit 0af7420dbc89dc6b5b903e5fcc6565aa8f497a44
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 22 12:17:00 2021 +0100

    Simulator: allow to export results in a csv file

commit aaf7dd6f1d820012b588e780178aaefdc64e2685
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 22 12:15:47 2021 +0100

    Make plottings optional in simulator cli output

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/290/ for more details.

(requesting changes to get it out of my review queue)

That's not a valid reason! A valid reason is "I agree with olasd's comments, fix them (plz)"...

swh/scheduler/simulator/common.py
100

well everything in this pipeline is a generator, so I see no harm in "double zipping" there.

Build is green

Patch application report for D4923 (id=17760)

Could not rebase; Attempt merge onto cf0583b079...

Updating cf0583b..baf5dce
Fast-forward
 swh/scheduler/cli/simulator.py      | 17 +++++++--
 swh/scheduler/simulator/__init__.py |  2 +-
 swh/scheduler/simulator/common.py   | 70 +++++++++++++++++++++++++++----------
 3 files changed, 68 insertions(+), 21 deletions(-)
Changes applied before test
commit baf5dce08ef24360ff89e92ff0cc6e5712cc20cd
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 22 12:17:00 2021 +0100

    Simulator: allow to export results in a csv file

commit aaf7dd6f1d820012b588e780178aaefdc64e2685
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 22 12:15:47 2021 +0100

    Make plottings optional in simulator cli output

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/291/ for more details.

(requesting changes to get it out of my review queue)

That's not a valid reason! A valid reason is "I agree with olasd's comments, fix them (plz)"...

Oh I also missed your 'no tests?' comment before, sorry.

add mininal test

now depends on D4984

Build is green

Patch application report for D4923 (id=17780)

Could not rebase; Attempt merge onto aaf7dd6f1d...

Updating aaf7dd6..aaffff2
Fast-forward
 swh/scheduler/cli/simulator.py        |  7 ++++++-
 swh/scheduler/simulator/common.py     | 36 +++++++++++++++++++++++++++++++++--
 swh/scheduler/tests/test_simulator.py | 11 ++++++++++-
 3 files changed, 50 insertions(+), 4 deletions(-)
Changes applied before test
commit aaffff2631a771b30c22b7a1fa69414bf3ed9dcd
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 22 12:17:00 2021 +0100

    Simulator: allow to export results in a csv file

commit 9fce3f6f2c73fe64663e9b3e41043161c5620f45
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon Feb 1 15:36:16 2021 +0100

    Add minimal tests for the SimulationReport.format() method

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/294/ for more details.

This revision is now accepted and ready to land.Feb 1 2021, 3:50 PM