Page MenuHomeSoftware Heritage

cli/identify: Add support for --recursive
Needs ReviewPublic

Authored by KShivendu on Apr 5 2021, 8:39 PM.

Details

Reviewers
zack
Group Reviewers
Reviewers
Summary

Add -r/--recursive flag in swh identify command
Related T3160

Diff Detail

Repository
rDMOD Data model
Branch
identify-cli-recursive-flag
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 20431
Build 31711: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 31710: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D5420 (id=19374)

Rebasing onto 9523be0552...

First, rewinding head to replay your work on top of it...
Applying: cli/identify: Add support for --recursive
Changes applied before test
commit 1a71c52340e44370163fc478a44a80fcb8a6e81a
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Tue Apr 6 00:06:34 2021 +0530

    cli/identify: Add support for --recursive

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/294/ for more details.

swh/model/cli.py
257

I will fix this and add a test soon. Do you see any other mistake in the diff?

ardumont added inline comments.
swh/model/cli.py
182

Make the coloring output an option, the option should be off by default.
If we want to pipe the result in another command, that might mess things
up to have an altered output (colors are altering the output here)

291

It's missing tests.
Please, add some.

zack added inline comments.
swh/model/cli.py
182

Agreed.

Unless you want to go the extra mile and behave like most git commands do. That is: detect if output is being piped to another command. If it is, disable coloring by default, but allow override with a flag. If it isn't being piped, enable coloring by default, allowing override with a flag.

swh/model/cli.py
182

That's a great idea. I will do it.

zack requested changes to this revision.Apr 6 2021, 11:39 AM
zack added inline comments.
swh/model/cli.py
6–10

is this interleaving of from/import/from/import for modules all coming from the stdlib ok w.r.t. coding guidelines and isort? (i'm not 100% sure of the answer, so maybe just double-check it)

175

this should be called something like "TerminalColor" or "ColorEscapes", it's not about colors in general

176–178

enum values should be all uppercase

181

make this function a class method of the Enum

200

as SWHID are always computed recursively, just not shown by default, the help message should be something like "Show computed SWHID recursively"

284

This is not the right way to implement this. Because you're recomputing the SWHIDs at all level, making the complexity of this thing quadratic (O(n^2)). By reading this I realized that this issue is probably not an easy hack as I initially thought, but does require in fact some refactoring. Instead of calling the high-level functions to compute the SWHID of a single file or dir like we are doing right now, we should build a single model object for the top-level dir, and either output its SWHID, or traverse it (without recomputing SWHIDs) to output all of it. As observed in T1136 it is a feature that already exists in swh-scanner too, that should be refactored.

This revision now requires changes to proceed.Apr 6 2021, 11:39 AM
KShivendu marked 7 inline comments as done and an inline comment as not done.

Updating D5420: cli/identify: Use TerminalColor Enum and change recursive flag's description

Build is green

Patch application report for D5420 (id=19537)

Rebasing onto eeedac7af7...

First, rewinding head to replay your work on top of it...
Applying: cli/identify: Add support for --recursive
Changes applied before test
commit 37b91cbec9620c5b8f67ce9622a1759cd0180b1b
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Tue Apr 6 00:06:34 2021 +0530

    cli/identify: Add support for --recursive

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/300/ for more details.

swh/model/cli.py
284

Where do I start? Can you please give me some pointers for the same?

KShivendu marked an inline comment as not done.Apr 8 2021, 7:29 PM

we should build a single model object for the top-level dir, and either output its SWHID, or traverse it (without recomputing SWHIDs) to output all of it

Where should I start? Can you please give me some pointers? I don't know much about the swh-models.

@KShivendu zack's comment explains how the code should work, and gives a pointer to an existing implementation of the technique (swh-scanner); this should be enough to start.

But don't feel obligated to continue working on this; as mentioned the task is harder than we expected.

@KShivendu zack's comment explains how the code should work, and gives a pointer to an existing implementation of the technique (swh-scanner); this should be enough to start.

But don't feel obligated to continue working on this; as mentioned the task is harder than we expected.

Right.

As an update, we briefly discussed the best way forward on this the other day on #swh-devel.
The next step seems to be to check if swh.scanner can use only swh.model.merkle as its model of the filesystem, without needing a separate swh.scanner.model. If that's the case, the recursive support should be based on that (and we should get rid of swh.scanner.model ; or, at the very least, remove from it all the duplicate logic from swh.model.merkle).