benchmark.py: run random algorithm only once
Details
- Reviewers
zack - Group Reviewers
Reviewers - Commits
- rDTSCNe46e713d2145: run random algorithm only once
rDTSCN3004b66787b2: use os.listdir() instead of os.walk() to avoid symlinks
Diff Detail
- Repository
- rDTSCN Code scanner
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Event Timeline
Build has FAILED
Patch application report for D5011 (id=17882)
Could not rebase; Attempt merge onto 33a9cd4eb9...
Auto-merging swh/scanner/cli.py Merge made by the 'recursive' strategy. benchmark.py | 136 ++++++++++++++ run_backend.sh | 15 ++ run_benchmark.sh | 37 ++++ swh/scanner/backend.py | 16 +- swh/scanner/benchmark_algos.py | 395 +++++++++++++++++++++++++++++++++++++++++ swh/scanner/cli.py | 73 ++++++++ swh/scanner/model.py | 57 +++++- 7 files changed, 718 insertions(+), 11 deletions(-) create mode 100755 benchmark.py create mode 100755 run_backend.sh create mode 100755 run_benchmark.sh create mode 100644 swh/scanner/benchmark_algos.py
Changes applied before test
commit 4d3001147e4469ca62353bcd681d9a696d596517
Merge: 33a9cd4 ba54311
Author: Jenkins user <jenkins@localhost>
Date: Thu Feb 4 13:29:34 2021 +0000
Merge branch 'diff-target' into HEAD
commit ba54311a7c2a7eb16491044a04507f9701b3c57b
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Thu Feb 4 14:28:31 2021 +0100
run random algorithm only once
commit aaf3266f05c569bd0f7f30013d455c37df2aaf27
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Thu Feb 4 14:17:59 2021 +0100
use os.listdir() instead of os.walk() to avoid symlinks
commit 3d3665a4f5bb77c981a27ee9206a2c92717e82b0
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Feb 2 15:30:54 2021 +0100
algo_min: delete the upstream directories if a (sub)directory is unknown
commit c42e643aa512cbd8c039be2350159e46d34daa0d
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Feb 2 13:24:12 2021 +0100
model: wrong iteration in 'iterate_bfs' function
commit 0d3b5cb86144b87accab7f9a45d6457f457d47d0
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Feb 2 11:13:13 2021 +0100
make 'set_children_status' works with different kind of nodes
commit b601f382db643ddb0af40c85d1d8fc5065bd7224
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Thu Jan 28 16:45:45 2021 +0100
file_priority: remove children only when the unset directory is known
If the directory is unknown the algorithm should check the downstream
directories since they could be unknown too.
commit 5e01c09af4c61a309d71adb0d4f61d1766b8a021
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Jan 26 10:10:00 2021 +0100
retry request in case of backend failure
commit ebad16c02da6bffbc96a623e082a4b5f706d7b1f
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Mon Jan 25 13:48:14 2021 +0100
algo_min: remove the current node as well
commit 5cd9f762467ece41d7d8e1ae1841e1d24aad45e4
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Mon Jan 18 10:26:06 2021 +0100
fix: the temporary directory is removed by tempfile
commit 7a289332f73025f94f7f85ab5bd6755b876ebe68
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Jan 12 23:12:18 2021 +0100
print results as a csv
commit 9e4df16d9486a891498124dd4cfb7558c57dfa0c
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Jan 12 23:10:39 2021 +0100
extract repositories in temporary directories
commit 7bd1939949dcbcf0c52b8647f2b1750f2c9d2300
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Thu Dec 10 23:59:31 2020 +0100
scanner experimentsLink to build: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/95/
See console output for more information: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/95/console
| swh/scanner/benchmark_algos.py | ||
|---|---|---|
| 305–308 | if you want to avoid symlinks, these doesn't work, because doc (for both) says: "This follows symbolic links, so both islink() and isfile() can be true for the same path." you want to avoid a test before either of these like: "if os.path.islink(...): ... continue ..." | |
| swh/scanner/benchmark_algos.py | ||
|---|---|---|
| 305–308 | actually, you probably do not want to ignore symlinks completely (I think?, it depends on how your tree is then used) if you want to keep them, probably you should just avoid listing root_path if *it* is a symlink, so using islink() on it before invoking listdir on it() | |
Build has FAILED
Patch application report for D5011 (id=17892)
Could not rebase; Attempt merge onto 33a9cd4eb9...
Auto-merging swh/scanner/cli.py Merge made by the 'recursive' strategy. benchmark.py | 136 ++++++++++++++ run_backend.sh | 15 ++ run_benchmark.sh | 37 ++++ swh/scanner/backend.py | 16 +- swh/scanner/benchmark_algos.py | 396 +++++++++++++++++++++++++++++++++++++++++ swh/scanner/cli.py | 73 ++++++++ swh/scanner/model.py | 57 +++++- 7 files changed, 719 insertions(+), 11 deletions(-) create mode 100755 benchmark.py create mode 100755 run_backend.sh create mode 100755 run_benchmark.sh create mode 100644 swh/scanner/benchmark_algos.py
Changes applied before test
commit 34d1383d95e3a26cd5d2e26aad84dbe624698a80
Merge: 33a9cd4 0806485
Author: Jenkins user <jenkins@localhost>
Date: Thu Feb 4 15:31:35 2021 +0000
Merge branch 'diff-target' into HEAD
commit 080648583efcdf14c31af2f42ccc1c86f2745b63
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Thu Feb 4 16:28:21 2021 +0100
run random algorithm only once
commit 3004b66787b28cffa1047427876750397f02e06a
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Thu Feb 4 16:27:59 2021 +0100
use os.listdir() instead of os.walk() to avoid symlinks
commit 3d3665a4f5bb77c981a27ee9206a2c92717e82b0
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Feb 2 15:30:54 2021 +0100
algo_min: delete the upstream directories if a (sub)directory is unknown
commit c42e643aa512cbd8c039be2350159e46d34daa0d
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Feb 2 13:24:12 2021 +0100
model: wrong iteration in 'iterate_bfs' function
commit 0d3b5cb86144b87accab7f9a45d6457f457d47d0
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Feb 2 11:13:13 2021 +0100
make 'set_children_status' works with different kind of nodes
commit b601f382db643ddb0af40c85d1d8fc5065bd7224
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Thu Jan 28 16:45:45 2021 +0100
file_priority: remove children only when the unset directory is known
If the directory is unknown the algorithm should check the downstream
directories since they could be unknown too.
commit 5e01c09af4c61a309d71adb0d4f61d1766b8a021
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Jan 26 10:10:00 2021 +0100
retry request in case of backend failure
commit ebad16c02da6bffbc96a623e082a4b5f706d7b1f
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Mon Jan 25 13:48:14 2021 +0100
algo_min: remove the current node as well
commit 5cd9f762467ece41d7d8e1ae1841e1d24aad45e4
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Mon Jan 18 10:26:06 2021 +0100
fix: the temporary directory is removed by tempfile
commit 7a289332f73025f94f7f85ab5bd6755b876ebe68
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Jan 12 23:12:18 2021 +0100
print results as a csv
commit 9e4df16d9486a891498124dd4cfb7558c57dfa0c
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Jan 12 23:10:39 2021 +0100
extract repositories in temporary directories
commit 7bd1939949dcbcf0c52b8647f2b1750f2c9d2300
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Thu Dec 10 23:59:31 2020 +0100
scanner experimentsLink to build: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/96/
See console output for more information: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/96/console
Build has FAILED
Patch application report for D5011 (id=17893)
Rebasing onto 33a9cd4eb9...
First, rewinding head to replay your work on top of it... Applying: scanner experiments Applying: extract repositories in temporary directories Applying: print results as a csv Applying: fix: the temporary directory is removed by tempfile Applying: algo_min: remove the current node as well Applying: retry request in case of backend failure Applying: file_priority: remove children only when the unset directory is known Applying: make 'set_children_status' works with different kind of nodes Applying: model: wrong iteration in 'iterate_bfs' function Applying: algo_min: delete the upstream directories if a (sub)directory is unknown Applying: check if path is a symlink Applying: run random algorithm only once
Changes applied before test
commit 6c534b8af6b62468cf8467aa2791f63f1a471958
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Thu Feb 4 16:47:24 2021 +0100
run random algorithm only once
commit 3446bb600e3aeca5ddc22b5b9a17eda224996450
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Thu Feb 4 16:27:59 2021 +0100
check if path is a symlink
exclude the path if it is a symlink.
- os.listdir() instead of os.walk() to list subdirectories
commit b46c265a776490a6797454e64e5cbc607fba1e94
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Feb 2 15:30:54 2021 +0100
algo_min: delete the upstream directories if a (sub)directory is unknown
commit 4cec0aa255ba71479acb7cd58048f697c3ad0aa5
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Feb 2 13:24:12 2021 +0100
model: wrong iteration in 'iterate_bfs' function
commit 15cb48637cf708bf15fcab7a6958b2b97bdafe7b
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Feb 2 11:13:13 2021 +0100
make 'set_children_status' works with different kind of nodes
commit 3ebcebddc15ac53203c53ac771a501339ff681a8
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Thu Jan 28 16:45:45 2021 +0100
file_priority: remove children only when the unset directory is known
If the directory is unknown the algorithm should check the downstream
directories since they could be unknown too.
commit d64b0d8d402872de7351b0674bde391efcff8fcf
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Jan 26 10:10:00 2021 +0100
retry request in case of backend failure
commit ba29deefccf09642d1c006b1e0887f369d87d321
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Mon Jan 25 13:48:14 2021 +0100
algo_min: remove the current node as well
commit fa7460a9f9a1a291ea43f7af60486c4a362d04d2
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Mon Jan 18 10:26:06 2021 +0100
fix: the temporary directory is removed by tempfile
commit f7464b81a5169755a5dbcca853a694ccb29ec9e7
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Jan 12 23:12:18 2021 +0100
print results as a csv
commit f0f34283cc77dd0795484f5904918a7bba67e329
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Jan 12 23:10:39 2021 +0100
extract repositories in temporary directories
commit 2d4bf40939653e71d0715a4d3fdba6ce5765991c
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Thu Dec 10 23:59:31 2020 +0100
scanner experimentsLink to build: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/97/
See console output for more information: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/97/console
Build has FAILED
Patch application report for D5011 (id=17894)
Rebasing onto 33a9cd4eb9...
First, rewinding head to replay your work on top of it... Applying: scanner experiments Applying: extract repositories in temporary directories Applying: print results as a csv Applying: fix: the temporary directory is removed by tempfile Applying: algo_min: remove the current node as well Applying: retry request in case of backend failure Applying: file_priority: remove children only when the unset directory is known Applying: make 'set_children_status' works with different kind of nodes Applying: model: wrong iteration in 'iterate_bfs' function Applying: algo_min: delete the upstream directories if a (sub)directory is unknown Applying: check if path is a symlink Applying: run random algorithm only once
Changes applied before test
commit 2eca880da64bf5537ac4603a09cd2804c3151d40
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Thu Feb 4 16:47:24 2021 +0100
run random algorithm only once
commit 3a6203415be0be7825edd74cd505bb6d14ffb635
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Thu Feb 4 16:27:59 2021 +0100
check if path is a symlink
exclude the path if it is a symlink.
- os.listdir() instead of os.walk() to list subdirectories
commit e3e1a96f5913905a42762c672720d1480184f858
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Feb 2 15:30:54 2021 +0100
algo_min: delete the upstream directories if a (sub)directory is unknown
commit 5f27ca465bc33d8babc70f8bfb258165934153e0
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Feb 2 13:24:12 2021 +0100
model: wrong iteration in 'iterate_bfs' function
commit 590fc3252c7aabbdf30f5fce001d45d487a880d7
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Feb 2 11:13:13 2021 +0100
make 'set_children_status' works with different kind of nodes
commit d829830b407e06b3bc2624a8552adcddb90278ce
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Thu Jan 28 16:45:45 2021 +0100
file_priority: remove children only when the unset directory is known
If the directory is unknown the algorithm should check the downstream
directories since they could be unknown too.
commit 4bceda44454777762d5bf677818478a72ad2f624
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Jan 26 10:10:00 2021 +0100
retry request in case of backend failure
commit 00a2d73a2193406d6fba0a46c91e3098d800d986
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Mon Jan 25 13:48:14 2021 +0100
algo_min: remove the current node as well
commit 243faa41794f2c5f4182d627bbf3a9dc2e14b75a
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Mon Jan 18 10:26:06 2021 +0100
fix: the temporary directory is removed by tempfile
commit 942d63226f3e589ce0315ec89317118198048a8a
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Jan 12 23:12:18 2021 +0100
print results as a csv
commit 88a9d3232e3a04f8e3d96e95ae05de7dc406c87a
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Jan 12 23:10:39 2021 +0100
extract repositories in temporary directories
commit 7a55f8962e424771aaf5410d7c11103f8fcdbb7c
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Thu Dec 10 23:59:31 2020 +0100
scanner experimentsLink to build: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/98/
See console output for more information: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/98/console
Build has FAILED
Patch application report for D5011 (id=17895)
Could not rebase; Attempt merge onto 33a9cd4eb9...
Auto-merging swh/scanner/cli.py Merge made by the 'recursive' strategy. benchmark.py | 136 ++++++++++++++ run_backend.sh | 15 ++ run_benchmark.sh | 37 ++++ swh/scanner/backend.py | 16 +- swh/scanner/benchmark_algos.py | 396 +++++++++++++++++++++++++++++++++++++++++ swh/scanner/cli.py | 73 ++++++++ swh/scanner/model.py | 57 +++++- 7 files changed, 719 insertions(+), 11 deletions(-) create mode 100755 benchmark.py create mode 100755 run_backend.sh create mode 100755 run_benchmark.sh create mode 100644 swh/scanner/benchmark_algos.py
Changes applied before test
commit 28ceb8e275f88e4fee71fbc725f9afb4360b5d0e
Merge: 33a9cd4 e46e713
Author: Jenkins user <jenkins@localhost>
Date: Thu Feb 4 16:39:04 2021 +0000
Merge branch 'diff-target' into HEAD
commit e46e713d2145f69be19e16f5d22a565648e7c0ff
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Thu Feb 4 16:28:21 2021 +0100
run random algorithm only once
commit 3004b66787b28cffa1047427876750397f02e06a
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Thu Feb 4 16:27:59 2021 +0100
use os.listdir() instead of os.walk() to avoid symlinks
commit 3d3665a4f5bb77c981a27ee9206a2c92717e82b0
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Feb 2 15:30:54 2021 +0100
algo_min: delete the upstream directories if a (sub)directory is unknown
commit c42e643aa512cbd8c039be2350159e46d34daa0d
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Feb 2 13:24:12 2021 +0100
model: wrong iteration in 'iterate_bfs' function
commit 0d3b5cb86144b87accab7f9a45d6457f457d47d0
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Feb 2 11:13:13 2021 +0100
make 'set_children_status' works with different kind of nodes
commit b601f382db643ddb0af40c85d1d8fc5065bd7224
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Thu Jan 28 16:45:45 2021 +0100
file_priority: remove children only when the unset directory is known
If the directory is unknown the algorithm should check the downstream
directories since they could be unknown too.
commit 5e01c09af4c61a309d71adb0d4f61d1766b8a021
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Jan 26 10:10:00 2021 +0100
retry request in case of backend failure
commit ebad16c02da6bffbc96a623e082a4b5f706d7b1f
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Mon Jan 25 13:48:14 2021 +0100
algo_min: remove the current node as well
commit 5cd9f762467ece41d7d8e1ae1841e1d24aad45e4
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Mon Jan 18 10:26:06 2021 +0100
fix: the temporary directory is removed by tempfile
commit 7a289332f73025f94f7f85ab5bd6755b876ebe68
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Jan 12 23:12:18 2021 +0100
print results as a csv
commit 9e4df16d9486a891498124dd4cfb7558c57dfa0c
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Tue Jan 12 23:10:39 2021 +0100
extract repositories in temporary directories
commit 7bd1939949dcbcf0c52b8647f2b1750f2c9d2300
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Thu Dec 10 23:59:31 2020 +0100
scanner experimentsLink to build: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/99/
See console output for more information: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/99/console