also insert objects by batches of 10k to make it nicer with ram usage.
Depends on D4919
Differential D4920
Randomize last_update in generated ListedOrigins in fill_test_data douardda on Jan 22 2021, 10:50 AM. Authored by
Details
Diff Detail
Event TimelineComment Actions Build has FAILED Patch application report for D4920 (id=17505)Could not rebase; Attempt merge onto 03460207a1... Updating 0346020..1c069ca Fast-forward swh/scheduler/cli/simulator.py | 13 +++++++--- swh/scheduler/simulator/__init__.py | 51 ++++++++++++++++++++++--------------- 2 files changed, 40 insertions(+), 24 deletions(-) Changes applied before testcommit 1c069ca34add6b26d060588abb7958a089cb0735 Author: David Douard <david.douard@sdfa3.org> Date: Thu Jan 21 11:33:19 2021 +0100 Randomize last_upadte in generated ListedOrigins in fill_test_data also insert objects by batches of 10k to make it nicer with ram usage. commit 8a9aaf3942d5585e6af038ebced3cde6faf27c7e Author: David Douard <david.douard@sdfa3.org> Date: Thu Jan 21 11:30:21 2021 +0100 Add a --num-origins option to the fill-test-data cli command Link to build: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/245/ Comment Actions I'd like to keep the simulator deterministic. What about adding a CLI option with a seed? Comment Actions Build is green Patch application report for D4920 (id=17505)Could not rebase; Attempt merge onto b93aa5be2c... Merge made by the 'recursive' strategy. swh/scheduler/cli/simulator.py | 13 +++++++--- swh/scheduler/simulator/__init__.py | 51 ++++++++++++++++++++++--------------- 2 files changed, 40 insertions(+), 24 deletions(-) Changes applied before testcommit dbca7a7b8a4b041a7092447ee4d91851dc22f711 Merge: b93aa5b 1c069ca Author: Jenkins user <jenkins@localhost> Date: Fri Jan 22 10:36:24 2021 +0000 Merge branch 'diff-target' into HEAD commit 1c069ca34add6b26d060588abb7958a089cb0735 Author: David Douard <david.douard@sdfa3.org> Date: Thu Jan 21 11:33:19 2021 +0100 Randomize last_upadte in generated ListedOrigins in fill_test_data also insert objects by batches of 10k to make it nicer with ram usage. commit 8a9aaf3942d5585e6af038ebced3cde6faf27c7e Author: David Douard <david.douard@sdfa3.org> Date: Thu Jan 21 11:30:21 2021 +0100 Add a --num-origins option to the fill-test-data cli command See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/255/ for more details. Comment Actions Build is green Patch application report for D4920 (id=17515)Rebasing onto b93aa5be2c... First, rewinding head to replay your work on top of it... Applying: Randomize last_update in generated ListedOrigins in fill_test_data Changes applied before testcommit b48917915cff235c880061afb29a6257f50b4baf Author: David Douard <david.douard@sdfa3.org> Date: Thu Jan 21 11:33:19 2021 +0100 Randomize last_update in generated ListedOrigins in fill_test_data also insert objects by batches of 10k to make it nicer with ram usage. See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/256/ for more details. Comment Actions Also, a given seed will not be enough here: there is also the maxts = int(utcnow().timestamp()) that will kill the deterministic property... Comment Actions So to get a deterministic behavior, the option should allow to hard set this last_update time. Comment Actions
or generate it from the seed as well Comment Actions This is not what I call reproducibility... Especially for simulation involving randomized stuff... Using hashing of some values as PRNG looks wrong to me (and not because of good or bad probabilistic properties of such generators) but because it makes the intent and method of randomization unclear and obfuscated, thus it makes the code hard to understand and maintain. I'd much prefer we do proper PRNG, with proper seeds management if we really need this level of reproducibility.
To me, the idea of such a simulation stack is not to identify (and fix) singular buggy behaviors.
or make it possible to be explicitly given, if need be. Comment Actions sure
Of course. But if you do encounter one, it's helpful to have a way to reproduce it. |