Page MenuHomeSoftware Heritage
Feed Advanced Search

Dec 15 2020

zack renamed T2812: scanner import db is slow, improve its performances from scanner: improve SWHID (txt) -> sqlite import time to scanner import db is slow, improve its performances.
Dec 15 2020, 5:48 PM · Code scanner

Dec 2 2020

zack added a project to T2836: swh scanner db import loads keeps all input SWHIDs in memory: Easy hack.
Dec 2 2020, 9:26 AM · Easy hack, Code scanner
zack triaged T2836: swh scanner db import loads keeps all input SWHIDs in memory as Normal priority.
Dec 2 2020, 9:26 AM · Easy hack, Code scanner

Nov 25 2020

zack triaged T2813: swh scanner db import does not validate SWHIDs as Low priority.
Nov 25 2020, 10:37 PM · Code scanner
zack triaged T2812: scanner import db is slow, improve its performances as Low priority.
Nov 25 2020, 10:00 PM · Code scanner
zack closed T2680: proxy support for swh scanner as Resolved by committing rDTSCN65f0b8e4c6ea: honor HTTP(S)_PROXY environment variables, to support HTTP proxies.
Nov 25 2020, 4:42 PM · Easy hack, Code scanner

Nov 24 2020

DanSeraf closed T2760: swh-scanner: add support for local DB of known SWHIDs as Resolved.
Nov 24 2020, 1:54 PM · Code scanner

Nov 22 2020

DanSeraf added a revision to T2760: swh-scanner: add support for local DB of known SWHIDs: D4552: 'db serve' option to start the API service.
Nov 22 2020, 4:19 PM · Code scanner

Nov 18 2020

DanSeraf added a revision to T2760: swh-scanner: add support for local DB of known SWHIDs: D4508: scanner: 'db import' option to create local database with known swhids.
Nov 18 2020, 2:24 PM · Code scanner

Nov 16 2020

DanSeraf changed the status of T2760: swh-scanner: add support for local DB of known SWHIDs from Open to Work in Progress.
Nov 16 2020, 10:41 AM · Code scanner

Nov 6 2020

zack updated the task description for T2760: swh-scanner: add support for local DB of known SWHIDs.
Nov 6 2020, 2:50 PM · Code scanner
zack triaged T2760: swh-scanner: add support for local DB of known SWHIDs as Normal priority.
Nov 6 2020, 2:32 PM · Code scanner

Oct 24 2020

zack triaged T2731: scanner: strip the path passed as argument from output as Low priority.
Oct 24 2020, 5:01 PM · Easy hack, Code scanner
zack updated the task description for T2730: scanner: should output the root SWHID as well.
Oct 24 2020, 4:58 PM · Easy hack, Code scanner
zack updated the task description for T2730: scanner: should output the root SWHID as well.
Oct 24 2020, 4:58 PM · Easy hack, Code scanner
zack triaged T2730: scanner: should output the root SWHID as well as Normal priority.
Oct 24 2020, 4:58 PM · Easy hack, Code scanner

Oct 13 2020

DanSeraf triaged T2692: Move the output related functions to another (sub)module as Normal priority.
Oct 13 2020, 9:57 AM · Code scanner
DanSeraf closed T2690: swh scanner reports double results in ndjson format as Resolved by committing rDTSCNc2768d171a78: model: dropped _iter_nodes_attr function.
Oct 13 2020, 9:36 AM · Code scanner

Oct 12 2020

zack triaged T2679: Use the `swh.model` version of `extract_regex_objs` as Low priority.
Oct 12 2020, 6:59 PM · Code scanner
zack triaged T2690: swh scanner reports double results in ndjson format as Normal priority.
Oct 12 2020, 6:59 PM · Code scanner
zvr created T2690: swh scanner reports double results in ndjson format.
Oct 12 2020, 6:47 PM · Code scanner

Oct 9 2020

zack added a project to T2680: proxy support for swh scanner: Easy hack.
Oct 9 2020, 2:59 PM · Easy hack, Code scanner
zack triaged T2680: proxy support for swh scanner as Normal priority.
Oct 9 2020, 2:58 PM · Easy hack, Code scanner
acezar created T2679: Use the `swh.model` version of `extract_regex_objs`.
Oct 9 2020, 2:47 PM · Code scanner

Sep 28 2020

tenma closed T2632: swh scanner fail to start when configuration file is missing as Resolved.
Sep 28 2020, 10:14 AM · Code scanner

Sep 25 2020

tenma added a revision to T2632: swh scanner fail to start when configuration file is missing: D4046: Fix default config file may be absent in scanner cli.
Sep 25 2020, 11:44 AM · Code scanner

Sep 23 2020

zack triaged T2632: swh scanner fail to start when configuration file is missing as High priority.
Sep 23 2020, 2:12 PM · Code scanner

Sep 14 2020

tenma closed T2572: swh-scanner: add support for authentication token to lift rate-limit as Resolved by committing rDTSCN0abe025e277b: Add standard config support and auth token for swh-scanner.
Sep 14 2020, 2:10 PM · Code scanner

Sep 9 2020

tenma added a revision to T2572: swh-scanner: add support for authentication token to lift rate-limit: D3900: Add standard config support and HTTP auth token for swh-scanner.
Sep 9 2020, 7:58 PM · Code scanner

Sep 8 2020

zack assigned T2572: swh-scanner: add support for authentication token to lift rate-limit to tenma.
Sep 8 2020, 10:50 AM · Code scanner
zack triaged T2572: swh-scanner: add support for authentication token to lift rate-limit as Normal priority.
Sep 8 2020, 10:25 AM · Code scanner
zack renamed T2300: swh-scanner: print a nicer error message when rate limit is hit from scanner: print a nicer error message when rate limit is hit to swh-scanner: print a nicer error message when rate limit is hit.
Sep 8 2020, 10:24 AM · Easy hack, Code scanner

Jun 22 2020

DanSeraf closed T2364: scanner: file browser in the sunburst/dashboard output as Resolved by committing rDTSCN0f10ec6ae8fe: dashboard: file visualization per directory path.
Jun 22 2020, 7:39 PM · Code scanner
DanSeraf closed T2336: scanner: add support for an exclusion list as Resolved.
Jun 22 2020, 2:57 PM · Code scanner

Apr 30 2020

DanSeraf closed T2365: scanner: add color legend for sunburst output as Resolved by committing rDTSCNfb8ae03e494c: plot: color legend.
Apr 30 2020, 12:41 PM · Code scanner

Apr 29 2020

DanSeraf closed T2363: scanner: json output should return both known and unknown files/dirs as Resolved by committing rDTSCN623a9dbe6157: ndjson output format.
Apr 29 2020, 4:40 PM · Code scanner

Apr 23 2020

olasd added a comment to T2363: scanner: json output should return both known and unknown files/dirs.

Just jumping in, I suggest using ndjson (newline-delimited json) instead of a full json tree, as the former is easier to stream / parse incrementally for large outputs (like the linux kernel).

Apr 23 2020, 12:14 PM · Code scanner
zack added a comment to T2363: scanner: json output should return both known and unknown files/dirs.
$ swh scanner scan -f json /tmp/test
{
    "dir1": {
        "children": {
            "subdir1": {
                "children": {
                    "text.txt": {
                        "known": true,
                        "swhid": "swh:1:cnt:ff5b57b7095eb5d168a36db6552ad2ce1f219bf6"
                    }
Apr 23 2020, 10:47 AM · Code scanner

Apr 22 2020

DanSeraf added a comment to T2363: scanner: json output should return both known and unknown files/dirs.

The new json output will be like the following:

Apr 22 2020, 6:13 PM · Code scanner

Apr 15 2020

DanSeraf closed T2362: scanner: aiohttp.client_exceptions.ServerDisconnectedError: None as Invalid.
Apr 15 2020, 5:58 PM · Code scanner
zack updated the task description for T2363: scanner: json output should return both known and unknown files/dirs.
Apr 15 2020, 2:07 PM · Code scanner
zack triaged T2365: scanner: add color legend for sunburst output as Low priority.
Apr 15 2020, 1:56 PM · Code scanner
zack reopened T2300: swh-scanner: print a nicer error message when rate limit is hit as "Open".

Reopening, as it looks like this issue is back, e.g.:

$ swh scanner scan -f sunburst . 
Traceback (most recent call last):
  File "/home/zack/.virtualenvs/swh/bin/swh", line 11, in <module>
    load_entry_point('swh.core', 'console_scripts', 'swh')()
  File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-core/swh/core/cli/__init__.py", line 122, in main
    return swh(auto_envvar_prefix="SWH")
  File "/home/zack/.virtualenvs/swh/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/zack/.virtualenvs/swh/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/zack/.virtualenvs/swh/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/zack/.virtualenvs/swh/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/zack/.virtualenvs/swh/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/zack/.virtualenvs/swh/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/zack/.virtualenvs/swh/lib/python3.7/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/cli.py", line 56, in scan
    loop.run_until_complete(run(path, api_url, source_tree))
  File "/usr/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
    return future.result()
  File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/scanner.py", line 138, in run
    await _scan(root, session, api_url, source_tree)
  File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/scanner.py", line 135, in _scan
    await _scan(path, session, api_url, source_tree)
  File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/scanner.py", line 135, in _scan
    await _scan(path, session, api_url, source_tree)
  File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/scanner.py", line 135, in _scan
    await _scan(path, session, api_url, source_tree)
  File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/scanner.py", line 125, in _scan
    for path, pid, found in await parse_path(root, session, api_url):
  File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/scanner.py", line 104, in parse_path
    parsed_pids = await pids_discovery(list(parsed_paths.values()), session, api_url)
  File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/scanner.py", line 61, in pids_discovery
    return await make_request(pids)
  File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/scanner.py", line 49, in make_request
    error_response(resp.reason, resp.status, endpoint)
  File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/exceptions.py", line 18, in error_response
    raise APIError(error_msg)
swh.scanner.exceptions.APIError: "429 Too Many Requests: 'https://archive.softwareheritage.org/api/1/known/'"
Apr 15 2020, 1:55 PM · Easy hack, Code scanner
zack triaged T2364: scanner: file browser in the sunburst/dashboard output as Normal priority.
Apr 15 2020, 1:51 PM · Code scanner
zack triaged T2363: scanner: json output should return both known and unknown files/dirs as Normal priority.
Apr 15 2020, 1:45 PM · Code scanner
zack triaged T2362: scanner: aiohttp.client_exceptions.ServerDisconnectedError: None as High priority.
Apr 15 2020, 1:37 PM · Code scanner

Mar 26 2020

zack added a comment to T2336: scanner: add support for an exclusion list.

The problem here is that actually swh-model il calculating the persistent identifier of the input path, so it also has to exclude the specified paths.

Mar 26 2020, 4:44 PM · Code scanner

Mar 25 2020

DanSeraf added a comment to T2336: scanner: add support for an exclusion list.

The problem here is that actually swh-model il calculating the persistent identifier of the input path, so it also has to exclude the specified paths.

Mar 25 2020, 3:28 PM · Code scanner
zack triaged T2336: scanner: add support for an exclusion list as Normal priority.
Mar 25 2020, 11:07 AM · Code scanner

Mar 11 2020

DanSeraf closed T2300: swh-scanner: print a nicer error message when rate limit is hit as Resolved by committing rDTSCNcb5faf6e866a: exceptions: APIError response.
Mar 11 2020, 9:51 AM · Easy hack, Code scanner

Mar 10 2020

DanSeraf closed T2299: scanner: add integration tests as Resolved by committing rDTSCNa5555dd12464: basic test for the scanner.
Mar 10 2020, 10:46 AM · Code scanner

Mar 6 2020

DanSeraf added a watcher for Code scanner: DanSeraf.
Mar 6 2020, 10:46 AM

Mar 3 2020

DanSeraf closed T2298: scanner: support alternative output formats as Resolved by committing rDTSCNcedf6fc9b068: model: output result to json file.
Mar 3 2020, 2:46 PM · Code scanner

Mar 2 2020

zack renamed T2298: scanner: support alternative output formats from swh-scanner: more output formats to scanner: support alternative output formats.
Mar 2 2020, 7:38 PM · Code scanner

Feb 28 2020

zack assigned T2300: swh-scanner: print a nicer error message when rate limit is hit to DanSeraf.
Feb 28 2020, 5:08 PM · Easy hack, Code scanner
zack triaged T2300: swh-scanner: print a nicer error message when rate limit is hit as Low priority.
Feb 28 2020, 5:06 PM · Easy hack, Code scanner
zack triaged T2299: scanner: add integration tests as High priority.
Feb 28 2020, 4:10 PM · Code scanner
zack added a project to T2298: scanner: support alternative output formats: Code scanner.
Feb 28 2020, 3:54 PM · Code scanner
zack created Code scanner.
Feb 28 2020, 3:54 PM