Implementation of a sunburst chart for the scanner
The sunburst chart show for each directory how many files are present and the percentage of discovered.
result example: raw_result
Differential D2863
scanner: show result with a sunburst chart Authored by DanSeraf on Mar 20 2020, 7:02 PM.
Details Implementation of a sunburst chart for the scanner The sunburst chart show for each directory how many files are present and the percentage of discovered.
Diff Detail
Event TimelineComment Actions Build is green Comment Actions I'm quite confused by the data flow in plot.py (as I am often with code dealing with dataframes). Could you add a (large) docstring at the beginning of the module explaining the overall idea, and more comments in the code?
Comment Actions Build is green Comment Actions This time I reviewed plot.py. Most of my comments are to make the code more readable, and don't require in-depth changes
Comment Actions Build is green Comment Actions Sorry I didn't do it earlier, but I finally got around to actually running this code, and I found a bug: if two directories have the same name (or if there's one named "total"), then the swh-scanner process exits early, and nothing is shown in the browser. Also, for consistency with other formats, I would expect --format=sunburst to output a file (like other formats) instead of opening a browser. You could however add a new option --show or --open to immediately a browser.
Comment Actions Thanks! I will check a solution for that; anyways, i didn't have problem testing directories that have the same name, could you provide me an example, so i can replicate your test?
Yeah, i can easily write the result in a static file. Comment Actions Sure (sorry, I should have give you one earlier): rm /tmp/scan_test -rf # cleanup mkdir /tmp/scan_test/foo -p swh scanner scan /tmp/scan_test --format sunburst # works mkdir /tmp/scan_test/foo/foo -p swh scanner scan /tmp/scan_test --format sunburst # works touch /tmp/scan_test/foo/__init__.py swh scanner scan /tmp/scan_test --format sunburst # crash rm -rf /tmp/scan_test/foo/foo swh scanner scan /tmp/scan_test --format sunburst # works again mkdir /tmp/scan_test/foo/total swh scanner scan /tmp/scan_test --format sunburst # crashes again Comment Actions Build is green Comment Actions Build is green Patch application report for D2863 (id=10451)Rebasing onto 4e96d30076... First, rewinding head to replay your work on top of it... Applying: model: get information about directories Applying: changed structure of temp_folder fixture Applying: new visualization format: sunburst chart Changes applied before testcommit 980f13f1701072e7180d7da1175d01f1e6128781
Author: Daniele Serafini <danseraf@softwareheritage.org>
Date: Fri Mar 20 18:58:06 2020 +0100
new visualization format: sunburst chart
generate a sunburst chart showing for each directory how many
files are discovered
plot test: creation of dataframe from directories; creation of
hierarchical dataframe
commit c032c09f3b1927f376368b82de566515cd28fbd5
Author: Daniele Serafini <danseraf@softwareheritage.org>
Date: Fri Mar 20 18:48:46 2020 +0100
changed structure of temp_folder fixture
commit 34f814276d59c2a9c039563af1d44aa8d3d24262
Author: Daniele Serafini <danseraf@softwareheritage.org>
Date: Fri Mar 20 18:45:55 2020 +0100
model: get information about directories
for each directory store information about how many contents are
presents and discoveredSee https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/3/ for more details. Comment Actions If i want to make a test i have to raise an exception when the labels (hierarchical_df['id'] in this case) used to generate the sunburst, contains equal values; so i should check every value in labels. Comment Actions With the current code, it can't, indeed. But this kind of test is called a regression test, as it catches future mistakes in the code.
Indeed, there isn't much point in that. Comment Actions Build is green Patch application report for D2863 (id=10484)Rebasing onto 4e96d30076... Current branch diff-target is up to date. Changes applied before testcommit a618bf17c0492f427174abc43e5758f4501cbc9a
Author: Daniele Serafini <danseraf@softwareheritage.org>
Date: Fri Mar 20 18:58:06 2020 +0100
new visualization format: sunburst chart
generate a sunburst chart showing for each directory how many
files are present and the percentage of files known
plot test: creation of direcories dataframe; creation of
hierarchical dataframe
commit e4bb51b7d8949c24992a07e2477445b5e578fb40
Author: Daniele Serafini <danseraf@softwareheritage.org>
Date: Fri Mar 20 18:48:46 2020 +0100
changed structure of temp_folder fixture
commit a49c4e5ec053f77e974f65ca25484b1fbe762d00
Author: Daniele Serafini <danseraf@softwareheritage.org>
Date: Fri Mar 20 18:45:55 2020 +0100
model: get information about directories
for each directory store information about how many contents are
presents and discoveredSee https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/4/ for more details. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||