Implementation of a sunburst chart for the scanner
The sunburst chart show for each directory how many files are present and the percentage of discovered.
result example: raw_result
Differential D2863
scanner: show result with a sunburst chart DanSeraf on Mar 20 2020, 7:02 PM. Authored by
Details Implementation of a sunburst chart for the scanner The sunburst chart show for each directory how many files are present and the percentage of discovered.
Diff Detail
Event TimelineComment Actions Build is green Comment Actions I'm quite confused by the data flow in plot.py (as I am often with code dealing with dataframes). Could you add a (large) docstring at the beginning of the module explaining the overall idea, and more comments in the code?
Comment Actions Build is green Comment Actions This time I reviewed plot.py. Most of my comments are to make the code more readable, and don't require in-depth changes
Comment Actions Build is green Comment Actions Sorry I didn't do it earlier, but I finally got around to actually running this code, and I found a bug: if two directories have the same name (or if there's one named "total"), then the swh-scanner process exits early, and nothing is shown in the browser. Also, for consistency with other formats, I would expect --format=sunburst to output a file (like other formats) instead of opening a browser. You could however add a new option --show or --open to immediately a browser.
Comment Actions Thanks! I will check a solution for that; anyways, i didn't have problem testing directories that have the same name, could you provide me an example, so i can replicate your test?
Yeah, i can easily write the result in a static file. Comment Actions Sure (sorry, I should have give you one earlier): rm /tmp/scan_test -rf # cleanup mkdir /tmp/scan_test/foo -p swh scanner scan /tmp/scan_test --format sunburst # works mkdir /tmp/scan_test/foo/foo -p swh scanner scan /tmp/scan_test --format sunburst # works touch /tmp/scan_test/foo/__init__.py swh scanner scan /tmp/scan_test --format sunburst # crash rm -rf /tmp/scan_test/foo/foo swh scanner scan /tmp/scan_test --format sunburst # works again mkdir /tmp/scan_test/foo/total swh scanner scan /tmp/scan_test --format sunburst # crashes again Comment Actions Build is green Comment Actions Build is green Patch application report for D2863 (id=10451)Rebasing onto 4e96d30076... First, rewinding head to replay your work on top of it... Applying: model: get information about directories Applying: changed structure of temp_folder fixture Applying: new visualization format: sunburst chart Changes applied before testcommit 980f13f1701072e7180d7da1175d01f1e6128781 Author: Daniele Serafini <danseraf@softwareheritage.org> Date: Fri Mar 20 18:58:06 2020 +0100 new visualization format: sunburst chart generate a sunburst chart showing for each directory how many files are discovered plot test: creation of dataframe from directories; creation of hierarchical dataframe commit c032c09f3b1927f376368b82de566515cd28fbd5 Author: Daniele Serafini <danseraf@softwareheritage.org> Date: Fri Mar 20 18:48:46 2020 +0100 changed structure of temp_folder fixture commit 34f814276d59c2a9c039563af1d44aa8d3d24262 Author: Daniele Serafini <danseraf@softwareheritage.org> Date: Fri Mar 20 18:45:55 2020 +0100 model: get information about directories for each directory store information about how many contents are presents and discovered See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/3/ for more details. Comment Actions If i want to make a test i have to raise an exception when the labels (hierarchical_df['id'] in this case) used to generate the sunburst, contains equal values; so i should check every value in labels. Comment Actions With the current code, it can't, indeed. But this kind of test is called a regression test, as it catches future mistakes in the code.
Indeed, there isn't much point in that. Comment Actions Build is green Patch application report for D2863 (id=10484)Rebasing onto 4e96d30076... Current branch diff-target is up to date. Changes applied before testcommit a618bf17c0492f427174abc43e5758f4501cbc9a Author: Daniele Serafini <danseraf@softwareheritage.org> Date: Fri Mar 20 18:58:06 2020 +0100 new visualization format: sunburst chart generate a sunburst chart showing for each directory how many files are present and the percentage of files known plot test: creation of direcories dataframe; creation of hierarchical dataframe commit e4bb51b7d8949c24992a07e2477445b5e578fb40 Author: Daniele Serafini <danseraf@softwareheritage.org> Date: Fri Mar 20 18:48:46 2020 +0100 changed structure of temp_folder fixture commit a49c4e5ec053f77e974f65ca25484b1fbe762d00 Author: Daniele Serafini <danseraf@softwareheritage.org> Date: Fri Mar 20 18:45:55 2020 +0100 model: get information about directories for each directory store information about how many contents are presents and discovered See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/4/ for more details. |