Page MenuHomeSoftware Heritage

code scanner prototype
ClosedPublic

Authored by DanSeraf on Feb 12 2020, 12:13 PM.

Details

Summary

First prototype of the code scanner

Usage example

swh scanner scan "/path/to/scan"

sample output (using swh-py-template as a source path):

swh/
debian/
.git/
│   refs/
│   │   heads/
│   │   │   master
│   │   remotes/
│   │   tags/
│   hooks/
│   │   pre-receive.sample
│   │   pre-push.sample
│   │   update.sample
│   │   pre-rebase.sample
│   │   fsmonitor-watchman.sample
│   │   post-update.sample
│   │   pre-commit
│   │   pre-commit.sample
│   │   prepare-commit-msg.sample
│   │   commit-msg.sample
│   │   pre-applypatch.sample
│   │   applypatch-msg.sample
│   objects/
│   │   pack/
│   │   │   pack-750cdb447ff562005695fc225fc7f81ba936b697.pack
│   │   │   pack-750cdb447ff562005695fc225fc7f81ba936b697.idx
│   │   info/
│   info/
│   logs/
│   │   refs/
│   │   │   heads/
│   │   │   │   master
│   │   │   remotes/
│   │   │   │   origin/
│   │   │   │   │   HEAD
│   │   HEAD
│   branches/
│   description
│   ORIG_HEAD
│   HEAD
│   config
│   FETCH_HEAD
│   packed-refs
│   index
docs/
requirements-swh.txt
README.md
AUTHORS
requirements.txt
MANIFEST.in
pytest.ini
mypy.ini
LICENSE
CONTRIBUTORS
requirements-test.txt
.gitignore
Makefile
tox.ini
setup.py
CODE_OF_CONDUCT.md
.pre-commit-config.yaml

The output has colors indicating if a directory or a file is discovered:
If a directory is blue means that all the contents inside are found, otherwise it's red.
For each red directory, the file is green if it's found otherwise it's red.

Diff Detail

Repository
rDTSCN Code scanner
Branch
codescanner-prototype
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 10560
Build 15755: arc lint + arc unit

Event Timeline

vlorentz added a subscriber: vlorentz.

I'm missing context on what this code scanner does, so I'm only commenting on documentation for now.

New code should use type annotations, instead of having types in the docstring.

swh/foo/py.typed
1

re-add this file

swh/scanner/cli.py
28

Should have a docstring explaining what it does.

And if the whole package contains "scanner tools", then the command should have a more specific name telling what it does

swh/scanner/scanner.py
19–29

Missing description.

60–70
79–87

same

This revision now requires changes to proceed.Feb 12 2020, 12:27 PM
The cli supports only one operation: swh scanner [options] path, where path is the root path we want to scan.
The output now is a tree like structure with a green/blue text if the path is found, red otherwise.

Looks good.

Mostly nitpicks on docstrings as well:

  • I'd make the color output optional, that'd make for something more easily parsable.
  • Can we have a sample output of a run in the diff description, that'd be nice ;)
  • The logger module seems overkill as i see it used only once though

Cheers,

swh/scanner/exceptions.py
1 ↗(On Diff #9696)

Please add the headers ;)

swh/scanner/model.py
55 ↗(On Diff #9696)

use your colorize function.

swh/scanner/scanner.py
31

what about at least the url description?

72

As per a recent discussion, you can replace this by Iterator[Tuple[PosixPath, str]].

@vlorentz ^ tell me if i misremember ;)

125

fills

version.txt
1 ↗(On Diff #9696)

Add this file to .gitignore and remove it from the git index ;)

  • I'd make the color output optional, that'd make for something more easily parsable.

Ack. A good default behavior on this is to colorize output when stdout is connected to a terminal and not do it when it is connected to a pipe. That's what most git commands do and it's a very good default.

It's still missing a description of what it does (ie. check that files exist in swh), in the CLI help and/or README.

and I still think you should use a subcommand, in case you want to add new features in the future without breaking scripts that depend on this one.

swh/scanner/cli.py
29–34

Why not a base URL instead of host+port?

This revision now requires changes to proceed.Feb 27 2020, 3:23 PM
  • The logger module seems overkill as i see it used only once though

Yes, i see. I manly use it to benchmark different algorithms so i can remove it.

It's still missing a description of what it does (ie. check that files exist in swh), in the CLI help and/or README.

and I still think you should use a subcommand, in case you want to add new features in the future without breaking scripts that depend on this one.

I will update the description, anyway the only supported command by the module right now is the scan operation so i could add a subcommand when a new feature is needed .

swh/scanner/cli.py
29–34

Yeah, it should be fine with the url only

swh/scanner/cli.py
29–34

@DanSeraf, I'm already using a api_url argument in the (WIP…) API client for the webapp (D2461), let's standardize on that as a function parameter (api_url) and as a command line option (--api-url), both defaulting to https://archive.softwareheritage.org/api/1.

  • cli: scanner group with scan option, updated description
  • color only when stdout is connected to terminal
This revision is now accepted and ready to land.Feb 28 2020, 3:01 PM
ardumont edited the summary of this revision. (Show Details)