Paths

Table of Contentst

Differential D4920

Randomize last_update in generated ListedOrigins in fill_test_data
AbandonedPublic
Actions

Authored by douardda on Jan 22 2021, 10:50 AM.

Details

Reviewers

vlorentz

Group Reviewers

Reviewers

Summary

also insert objects by batches of 10k to make it nicer with ram usage.

Depends on D4919

Diff Detail

Repository

rDSCH Scheduling utilities

Branch

csv-stats

Lint

Lint Skipped

Unit

Unit Tests Skipped

Build Status

Buildable 18637
Build 28833: Phabricator diff pipeline on jenkins	Jenkins console · Jenkins
Build 28832: arc lint + arc unit

Event Timeline

douardda created this revision.Jan 22 2021, 10:50 AM

Herald added a reviewer: Reviewers. · View Herald TranscriptJan 22 2021, 10:50 AM

douardda added a child revision: D4921: Make plotting optional in simulator cli command.Jan 22 2021, 10:51 AM

Build has FAILED

Patch application report for D4920 (id=17505)

Could not rebase; Attempt merge onto 03460207a1...

Updating 0346020..1c069ca
Fast-forward
 swh/scheduler/cli/simulator.py      | 13 +++++++---
 swh/scheduler/simulator/__init__.py | 51 ++++++++++++++++++++++---------------
 2 files changed, 40 insertions(+), 24 deletions(-)

Changes applied before test

commit 1c069ca34add6b26d060588abb7958a089cb0735
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jan 21 11:33:19 2021 +0100

    Randomize last_upadte in generated ListedOrigins in fill_test_data
    
    also insert objects by batches of 10k to make it nicer with ram usage.

commit 8a9aaf3942d5585e6af038ebced3cde6faf27c7e
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jan 21 11:30:21 2021 +0100

    Add a --num-origins option to the fill-test-data cli command

Link to build: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/245/
See console output for more information: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/245/console

Harbormaster returned this revision to the author for changes because remote builds failed.Jan 22 2021, 10:55 AM

Harbormaster failed remote builds in B18637: Diff 17505!

douardda removed a child revision: D4921: Make plotting optional in simulator cli command.Jan 22 2021, 11:26 AM

douardda mentioned this in D4921: Make plotting optional in simulator cli command.Jan 22 2021, 11:27 AM

I'd like to keep the simulator deterministic. What about adding a CLI option with a seed?

typo and kill unneeded Diff dependency

Build is green

Patch application report for D4920 (id=17505)

Could not rebase; Attempt merge onto b93aa5be2c...

Merge made by the 'recursive' strategy.
 swh/scheduler/cli/simulator.py      | 13 +++++++---
 swh/scheduler/simulator/__init__.py | 51 ++++++++++++++++++++++---------------
 2 files changed, 40 insertions(+), 24 deletions(-)

Changes applied before test

commit dbca7a7b8a4b041a7092447ee4d91851dc22f711
Merge: b93aa5b 1c069ca
Author: Jenkins user <jenkins@localhost>
Date:   Fri Jan 22 10:36:24 2021 +0000

    Merge branch 'diff-target' into HEAD

commit 1c069ca34add6b26d060588abb7958a089cb0735
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jan 21 11:33:19 2021 +0100

    Randomize last_upadte in generated ListedOrigins in fill_test_data
    
    also insert objects by batches of 10k to make it nicer with ram usage.

commit 8a9aaf3942d5585e6af038ebced3cde6faf27c7e
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jan 21 11:30:21 2021 +0100

    Add a --num-origins option to the fill-test-data cli command

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/255/ for more details.

Harbormaster completed remote builds in B18637: Diff 17505.Jan 22 2021, 11:38 AM

Build is green

Patch application report for D4920 (id=17515)

Rebasing onto b93aa5be2c...

First, rewinding head to replay your work on top of it...
Applying: Randomize last_update in generated ListedOrigins in fill_test_data

Changes applied before test

commit b48917915cff235c880061afb29a6257f50b4baf
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jan 21 11:33:19 2021 +0100

    Randomize last_update in generated ListedOrigins in fill_test_data
    
    also insert objects by batches of 10k to make it nicer with ram usage.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/256/ for more details.

Harbormaster completed remote builds in B18647: Diff 17515.Jan 22 2021, 11:40 AM

douardda requested review of this revision.Jan 22 2021, 11:40 AM

vlorentz requested changes to this revision.Jan 22 2021, 11:41 AM

This revision now requires changes to proceed.Jan 22 2021, 11:41 AM

In D4920#123465, @vlorentz wrote:

I'd like to keep the simulator deterministic. What about adding a CLI option with a seed?

why not (cli option), but why (keep it deterministic)?

In D4920#123533, @douardda wrote:

In D4920#123465, @vlorentz wrote:

I'd like to keep the simulator deterministic. What about adding a CLI option with a seed?

why not (cli option), but why (keep it deterministic)?

Also, a given seed will not be enough here: there is also the maxts = int(utcnow().timestamp()) that will kill the deterministic property...

In D4920#123548, @douardda wrote:

In D4920#123533, @douardda wrote:

In D4920#123465, @vlorentz wrote:

I'd like to keep the simulator deterministic. What about adding a CLI option with a seed?

why not (cli option), but why (keep it deterministic)?

Also, a given seed will not be enough here: there is also the maxts = int(utcnow().timestamp()) that will kill the deterministic property...

So to get a deterministic behavior, the option should allow to hard set this last_update time.

In D4920#123533, @douardda wrote:

why not (cli option), but why (keep it deterministic)?

reproducibility, so we can run the simulator twice with different code, and be sure that differences in behavior are not caused by randomness
if a particular run gives an unexpected results, you can run it again with more logging / instrumentation to see what went wrong

So to get a deterministic behavior, the option should allow to hard set this last_update time.

or generate it from the seed as well

In D4920#123571, @vlorentz wrote:

In D4920#123533, @douardda wrote:

why not (cli option), but why (keep it deterministic)?

reproducibility, so we can run the simulator twice with different code, and be sure that differences in behavior are not caused by randomness

This is not what I call reproducibility... Especially for simulation involving randomized stuff... Using hashing of some values as PRNG looks wrong to me (and not because of good or bad probabilistic properties of such generators) but because it makes the intent and method of randomization unclear and obfuscated, thus it makes the code hard to understand and maintain.

I'd much prefer we do proper PRNG, with proper seeds management if we really need this level of reproducibility.

if a particular run gives an unexpected results, you can run it again with more logging / instrumentation to see what went wrong

To me, the idea of such a simulation stack is not to identify (and fix) singular buggy behaviors.

So to get a deterministic behavior, the option should allow to hard set this last_update time.

or generate it from the seed as well

or make it possible to be explicitly given, if need be.

In D4920#123594, @douardda wrote:

I'd much prefer we do proper PRNG, with proper seeds management if we really need this level of reproducibility.

sure

To me, the idea of such a simulation stack is not to identify (and fix) singular buggy behaviors.

Of course. But if you do encounter one, it's helpful to have a way to reproduce it.

ardumont retitled this revision from Randomize last_upadte in generated ListedOrigins in fill_test_data to Randomize last_update in generated ListedOrigins in fill_test_data.Jan 25 2021, 6:26 PM

douardda abandoned this revision.Sep 3 2021, 10:49 AM

Revision Contents
Changeset List

Path

Size

Coverage (All)

Coverage (Touched)

swh/

scheduler/

simulator/

__init__.py

51 lines

93%

Diff 17505

View Options

swh/scheduler/simulator/init.py

Randomize last_update in generated ListedOrigins in fill_test_dataAbandonedPublicActions

Details

Diff Detail

Event Timeline

Patch application report for D4920 (id=17505)

Changes applied before test

Patch application report for D4920 (id=17505)

Changes applied before test

Patch application report for D4920 (id=17515)

Changes applied before test

Revision ContentsChangeset List

Diff 17505

swh/scheduler/simulator/__init__.py

Randomize last_update in generated ListedOrigins in fill_test_data
AbandonedPublic
Actions

Revision Contents
Changeset List

swh/scheduler/simulator/init.py