Page MenuHomeSoftware Heritage

swh/scanner : Strip root path from json output
ClosedPublic

Authored by KShivendu on Mar 7 2021, 2:34 PM.

Diff Detail

Repository
rDTSCN Code scanner
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

KShivendu retitled this revision from Strip root path from json output to swh/scanner : Strip root path from json output.Mar 7 2021, 2:34 PM

Build has FAILED

Patch application report for D5213 (id=18670)

Rebasing onto 433f556301...

Current branch diff-target is up to date.
Changes applied before test
commit c5b36a731047234588fcd2336ff8058129d51948
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Sat Mar 6 20:14:38 2021 +0530

    Strip root path from json output

Link to build: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/104/
See console output for more information: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/104/console

Harbormaster returned this revision to the author for changes because remote builds failed.Mar 7 2021, 2:36 PM
Harbormaster failed remote builds in B19763: Diff 18670!

Updating D5213: swh/scanner : Strip root path from json output

Build has FAILED

Patch application report for D5213 (id=18671)

Rebasing onto 433f556301...

Current branch diff-target is up to date.
Changes applied before test
commit 2ce1b5d929d631fd5f64cd3d54f759802b6eb97c
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Sat Mar 6 20:14:38 2021 +0530

    Strip root path from json output

Link to build: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/105/
See console output for more information: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/105/console

Harbormaster returned this revision to the author for changes because remote builds failed.Mar 7 2021, 2:50 PM
Harbormaster failed remote builds in B19764: Diff 18671!

All the tests are passing when D5211 is applied. It is expected that the build errors will get fixed once D5211 is merged.

Tests are still failing on this, so some changes are needed for sure.

In addition, I'm suggesting above to change the interface of the model so that Path is still used, instead of dumbing down path to strings there.

swh/scanner/model.py
155–160

I understand this is not strictly speaking what was asked in the bug you're trying to fix with this diff, but looking at this I find that it's not correct that to_dict returns paths as strings, rather than keeping them as Path. (This is in turn due to what .attributes() does.)

Unless I'm missing something (cc: @DanSeraf for feedback), the model uses Path objects everywhere, and we should keep doing that for all model users. If someone needs to translate those paths to string, they should do that at the last minute.

Additionally, if we keep using path, what you are doing here could be done much more safely. Instead of using .replace you could (and should) use here something like PurePath.relative_to, which is much safer than string replacement.

Build has FAILED

Patch application report for D5213 (id=18671)

Rebasing onto 6c191b6eac...

First, rewinding head to replay your work on top of it...
Applying: Strip root path from json output
Changes applied before test
commit 451baabd2f21aedfb1fbbd45e93e092ca49f63a0
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Sat Mar 6 20:14:38 2021 +0530

    Strip root path from json output

Link to build: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/106/
See console output for more information: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/106/console

Tests are still failing, but that's caused by an aiohttp released published last saturday with an invalid version number

swh/scanner/model.py
155–160

I agree with the fact that the scanner.model should use Path objects everywhere. This should be changed when we will move all the visualization related functions to another sub-module (T2692). The new sub-module should change the Path objects to string before to output the information.

@KShivendu Regarding this diff, as Stefano stated, i think it's for now safe to extract only the relative path with Path.relative_to.

Updating D5213: Use Path.relative_to instead of str.replace

Build is green

Patch application report for D5213 (id=18711)

Rebasing onto 41c5774031...

Current branch diff-target is up to date.
Changes applied before test
commit ff4d4acfb5c54825ed8d506eaed36be503a7a344
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Sat Mar 6 20:14:38 2021 +0530

    Strip root path from json output

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/109/ for more details.

zack requested changes to this revision.Mar 10 2021, 12:27 PM

Thanks, this fix looks good.

However, it introduces an inconsistency between the json (which now strips the root path) and the text output (which does not strip it).

Can you make the textual output behave the same? (and retitle this diff accordingly)

This revision now requires changes to proceed.Mar 10 2021, 12:27 PM

Hi @zack
In my opinion, the textual output doesn't print anything other than the directory structure so instead of removing the whole root path we can just put the directory name (it looks better).

website-folder
│   js/
│   │   main.js
│   │   table.js
│   .gitignore
│   table.html
│   index.html
│   login.html

VS

│   js/
│   │   main.js
│   │   table.js
│   .gitignore
│   table.html
│   index.html
│   login.html

But the ndjson output must be stripped.

Updating D5213: Fix ndjson and textual output

Build is green

Patch application report for D5213 (id=18724)

Rebasing onto 41c5774031...

Current branch diff-target is up to date.
Changes applied before test
commit 9154ec2b8fd5347dec09131e190f5fb82b74c3d6
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Sat Mar 6 20:14:38 2021 +0530

    Clean path in scan output
    
    Clean scan outputs for ndjson,json and text format

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/110/ for more details.

zack requested changes to this revision.Mar 10 2021, 4:41 PM

In my opinion, the textual output doesn't print anything other than the directory structure so instead of removing the whole root path we can just put the directory name (it looks better).
[...]
But the ndjson output must be stripped.

OK, I'm sold, thanks!

But please also add yourself to CONTRIBUTORS as part of this diff.
Then I'll be happy to accept this one.

Also, minor point, we usually use "Closes XXX" in commit messages rather than "Fixes XXX". (They probably both work, but I haven't checked, and consistency is better anyway :-))

This revision now requires changes to proceed.Mar 10 2021, 4:41 PM

Updating D5213: swh/scanner : Add Kumar Shivedu to CONTRIBUTORS

Build is green

Patch application report for D5213 (id=18736)

Rebasing onto 41c5774031...

Current branch diff-target is up to date.
Changes applied before test
commit 9c9b39892bb36f9c2c03313ffbcc7f41cc2e0a31
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Sat Mar 6 20:14:38 2021 +0530

    CLI : Clean path in scan output
    
    Clean scan outputs for ndjson, json, and text formats

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/111/ for more details.

Updating D5213: swh/scanner : Add newline in CONTRIBUTORS file

Build is green

Patch application report for D5213 (id=18737)

Rebasing onto 41c5774031...

Current branch diff-target is up to date.
Changes applied before test
commit eb04424434021f39b8869b32a17c11c18872472f
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Sat Mar 6 20:14:38 2021 +0530

    CLI : Clean path in scan output
    
    Clean scan outputs for ndjson, json, and text formats

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/112/ for more details.

This revision is now accepted and ready to land.Mar 10 2021, 6:17 PM
This revision was automatically updated to reflect the committed changes.