Page MenuHomeSoftware Heritage

Code scannerFolder
ActivePublic

Members

  • This project does not have any members.
  • View All

Details

Description

Source code scanner using the Software Heritage archive as knowledge base.

Recent Activity

Mon, Jun 22

DanSeraf closed T2364: scanner: file browser in the sunburst/dashboard output as Resolved by committing rDTSCN0f10ec6ae8fe: dashboard: file visualization per directory path.
Mon, Jun 22, 7:39 PM · Code scanner
DanSeraf closed T2336: scanner: add support for an exclusion list as Resolved.
Mon, Jun 22, 2:57 PM · Code scanner

Apr 30 2020

DanSeraf closed T2365: scanner: add color legend for sunburst output as Resolved by committing rDTSCNfb8ae03e494c: plot: color legend.
Apr 30 2020, 12:41 PM · Code scanner

Apr 29 2020

DanSeraf closed T2363: scanner: json output should return both known and unknown files/dirs as Resolved by committing rDTSCN623a9dbe6157: ndjson output format.
Apr 29 2020, 4:40 PM · Code scanner

Apr 23 2020

olasd added a comment to T2363: scanner: json output should return both known and unknown files/dirs.

Just jumping in, I suggest using ndjson (newline-delimited json) instead of a full json tree, as the former is easier to stream / parse incrementally for large outputs (like the linux kernel).

Apr 23 2020, 12:14 PM · Code scanner
zack added a comment to T2363: scanner: json output should return both known and unknown files/dirs.
$ swh scanner scan -f json /tmp/test
{
    "dir1": {
        "children": {
            "subdir1": {
                "children": {
                    "text.txt": {
                        "known": true,
                        "swhid": "swh:1:cnt:ff5b57b7095eb5d168a36db6552ad2ce1f219bf6"
                    }
Apr 23 2020, 10:47 AM · Code scanner

Apr 22 2020

DanSeraf added a comment to T2363: scanner: json output should return both known and unknown files/dirs.

The new json output will be like the following:

Apr 22 2020, 6:13 PM · Code scanner

Apr 15 2020

DanSeraf closed T2362: scanner: aiohttp.client_exceptions.ServerDisconnectedError: None as Invalid.
Apr 15 2020, 5:58 PM · Code scanner
zack updated the task description for T2363: scanner: json output should return both known and unknown files/dirs.
Apr 15 2020, 2:07 PM · Code scanner
zack triaged T2365: scanner: add color legend for sunburst output as Low priority.
Apr 15 2020, 1:56 PM · Code scanner
zack reopened T2300: scanner: print a nicer error message when rate limit is hit as "Open".

Reopening, as it looks like this issue is back, e.g.:

$ swh scanner scan -f sunburst . 
Traceback (most recent call last):
  File "/home/zack/.virtualenvs/swh/bin/swh", line 11, in <module>
    load_entry_point('swh.core', 'console_scripts', 'swh')()
  File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-core/swh/core/cli/__init__.py", line 122, in main
    return swh(auto_envvar_prefix="SWH")
  File "/home/zack/.virtualenvs/swh/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/zack/.virtualenvs/swh/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/zack/.virtualenvs/swh/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/zack/.virtualenvs/swh/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/zack/.virtualenvs/swh/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/zack/.virtualenvs/swh/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/zack/.virtualenvs/swh/lib/python3.7/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/cli.py", line 56, in scan
    loop.run_until_complete(run(path, api_url, source_tree))
  File "/usr/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
    return future.result()
  File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/scanner.py", line 138, in run
    await _scan(root, session, api_url, source_tree)
  File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/scanner.py", line 135, in _scan
    await _scan(path, session, api_url, source_tree)
  File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/scanner.py", line 135, in _scan
    await _scan(path, session, api_url, source_tree)
  File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/scanner.py", line 135, in _scan
    await _scan(path, session, api_url, source_tree)
  File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/scanner.py", line 125, in _scan
    for path, pid, found in await parse_path(root, session, api_url):
  File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/scanner.py", line 104, in parse_path
    parsed_pids = await pids_discovery(list(parsed_paths.values()), session, api_url)
  File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/scanner.py", line 61, in pids_discovery
    return await make_request(pids)
  File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/scanner.py", line 49, in make_request
    error_response(resp.reason, resp.status, endpoint)
  File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/exceptions.py", line 18, in error_response
    raise APIError(error_msg)
swh.scanner.exceptions.APIError: "429 Too Many Requests: 'https://archive.softwareheritage.org/api/1/known/'"
Apr 15 2020, 1:55 PM · Easy hack, Code scanner
zack triaged T2364: scanner: file browser in the sunburst/dashboard output as Normal priority.
Apr 15 2020, 1:51 PM · Code scanner
zack triaged T2363: scanner: json output should return both known and unknown files/dirs as Normal priority.
Apr 15 2020, 1:45 PM · Code scanner
zack triaged T2362: scanner: aiohttp.client_exceptions.ServerDisconnectedError: None as High priority.
Apr 15 2020, 1:37 PM · Code scanner

Mar 26 2020

zack added a comment to T2336: scanner: add support for an exclusion list.

The problem here is that actually swh-model il calculating the persistent identifier of the input path, so it also has to exclude the specified paths.

Mar 26 2020, 4:44 PM · Code scanner

Mar 25 2020

DanSeraf added a comment to T2336: scanner: add support for an exclusion list.

The problem here is that actually swh-model il calculating the persistent identifier of the input path, so it also has to exclude the specified paths.

Mar 25 2020, 3:28 PM · Code scanner
zack triaged T2336: scanner: add support for an exclusion list as Normal priority.
Mar 25 2020, 11:07 AM · Code scanner

Mar 11 2020

DanSeraf closed T2300: scanner: print a nicer error message when rate limit is hit as Resolved by committing rDTSCNcb5faf6e866a: exceptions: APIError response.
Mar 11 2020, 9:51 AM · Easy hack, Code scanner

Mar 10 2020

DanSeraf closed T2299: scanner: add integration tests as Resolved by committing rDTSCNa5555dd12464: basic test for the scanner.
Mar 10 2020, 10:46 AM · Code scanner

Mar 6 2020

DanSeraf added a watcher for Code scanner: DanSeraf.
Mar 6 2020, 10:46 AM

Mar 3 2020

DanSeraf closed T2298: scanner: support alternative output formats as Resolved by committing rDTSCNcedf6fc9b068: model: output result to json file.
Mar 3 2020, 2:46 PM · Code scanner

Mar 2 2020

zack renamed T2298: scanner: support alternative output formats from swh-scanner: more output formats to scanner: support alternative output formats.
Mar 2 2020, 7:38 PM · Code scanner

Feb 28 2020

zack assigned T2300: scanner: print a nicer error message when rate limit is hit to DanSeraf.
Feb 28 2020, 5:08 PM · Easy hack, Code scanner
zack triaged T2300: scanner: print a nicer error message when rate limit is hit as Low priority.
Feb 28 2020, 5:06 PM · Easy hack, Code scanner
zack triaged T2299: scanner: add integration tests as High priority.
Feb 28 2020, 4:10 PM · Code scanner
zack added a project to T2298: scanner: support alternative output formats: Code scanner.
Feb 28 2020, 3:54 PM · Code scanner
zack created Code scanner.
Feb 28 2020, 3:54 PM