Page MenuHomeSoftware Heritage

scanner-benchmark: some code improvement
ClosedPublic

Authored by DanSeraf on Feb 2 2021, 1:29 PM.

Details

Summary

file_priority: remove children only when the unset directory is known
If the directory is unknown the algorithm should check the downstream
directories since they could be unknown too.

make set_children_status works with different kind of nodes

model: wrong iteration in iterate_bfs function

Diff Detail

Repository
rDTSCN Code scanner
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D4993 (id=17803)

Could not rebase; Attempt merge onto 33a9cd4eb9...

Auto-merging swh/scanner/cli.py
Merge made by the 'recursive' strategy.
 benchmark.py                   | 136 +++++++++++++++
 run_backend.sh                 |  15 ++
 run_benchmark.sh               |  37 ++++
 swh/scanner/backend.py         |  16 +-
 swh/scanner/benchmark_algos.py | 387 +++++++++++++++++++++++++++++++++++++++++
 swh/scanner/cli.py             |  73 ++++++++
 swh/scanner/model.py           |  57 +++++-
 7 files changed, 710 insertions(+), 11 deletions(-)
 create mode 100755 benchmark.py
 create mode 100755 run_backend.sh
 create mode 100755 run_benchmark.sh
 create mode 100644 swh/scanner/benchmark_algos.py
Changes applied before test
commit d5816b7cee059138676fdcd7e12f0b9a269a6b81
Merge: 33a9cd4 c42e643
Author: Jenkins user <jenkins@localhost>
Date:   Tue Feb 2 12:29:08 2021 +0000

    Merge branch 'diff-target' into HEAD

commit c42e643aa512cbd8c039be2350159e46d34daa0d
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Tue Feb 2 13:24:12 2021 +0100

    model: wrong iteration in 'iterate_bfs' function

commit 0d3b5cb86144b87accab7f9a45d6457f457d47d0
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Tue Feb 2 11:13:13 2021 +0100

    make 'set_children_status' works with different kind of nodes

commit b601f382db643ddb0af40c85d1d8fc5065bd7224
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Thu Jan 28 16:45:45 2021 +0100

    file_priority: remove children only when the unset directory is known
    
    If the directory is unknown the algorithm should check the downstream
    directories since they could be unknown too.

commit 5e01c09af4c61a309d71adb0d4f61d1766b8a021
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Tue Jan 26 10:10:00 2021 +0100

    retry request in case of backend failure

commit ebad16c02da6bffbc96a623e082a4b5f706d7b1f
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Mon Jan 25 13:48:14 2021 +0100

    algo_min: remove the current node as well

commit 5cd9f762467ece41d7d8e1ae1841e1d24aad45e4
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Mon Jan 18 10:26:06 2021 +0100

    fix: the temporary directory is removed by tempfile

commit 7a289332f73025f94f7f85ab5bd6755b876ebe68
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Tue Jan 12 23:12:18 2021 +0100

    print results as a csv

commit 9e4df16d9486a891498124dd4cfb7558c57dfa0c
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Tue Jan 12 23:10:39 2021 +0100

    extract repositories in temporary directories

commit 7bd1939949dcbcf0c52b8647f2b1750f2c9d2300
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Thu Dec 10 23:59:31 2020 +0100

    scanner experiments

Link to build: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/93/
See console output for more information: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/93/console

Harbormaster returned this revision to the author for changes because remote builds failed.Feb 2 2021, 1:30 PM
Harbormaster failed remote builds in B18944: Diff 17803!
This revision is now accepted and ready to land.Feb 2 2021, 2:18 PM