Just jumping in, I suggest using ndjson (newline-delimited json) instead of a full json tree, as the former is easier to stream / parse incrementally for large outputs (like the linux kernel).
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Feed Advanced Search
Advanced Search
Advanced Search
Dec 15 2020
Dec 15 2020
zack renamed T2812: scanner import db is slow, improve its performances from scanner: improve SWHID (txt) -> sqlite import time to scanner import db is slow, improve its performances.
Dec 2 2020
Dec 2 2020
zack added a project to T2836: swh scanner db import loads keeps all input SWHIDs in memory: Easy hack.
zack triaged T2836: swh scanner db import loads keeps all input SWHIDs in memory as Normal priority.
Nov 25 2020
Nov 25 2020
zack closed T2680: proxy support for swh scanner as Resolved by committing rDTSCN65f0b8e4c6ea: honor HTTP(S)_PROXY environment variables, to support HTTP proxies.
Nov 24 2020
Nov 24 2020
Nov 22 2020
Nov 22 2020
Nov 18 2020
Nov 18 2020
Nov 16 2020
Nov 16 2020
DanSeraf changed the status of T2760: swh-scanner: add support for local DB of known SWHIDs from Open to Work in Progress.
Nov 6 2020
Nov 6 2020
Oct 24 2020
Oct 24 2020
Oct 13 2020
Oct 13 2020
DanSeraf triaged T2692: Move the output related functions to another (sub)module as Normal priority.
DanSeraf closed T2690: swh scanner reports double results in ndjson format as Resolved by committing rDTSCNc2768d171a78: model: dropped _iter_nodes_attr function.
Oct 12 2020
Oct 12 2020
Oct 9 2020
Oct 9 2020
Sep 28 2020
Sep 28 2020
Sep 25 2020
Sep 25 2020
Sep 23 2020
Sep 23 2020
Sep 14 2020
Sep 14 2020
Sep 9 2020
Sep 9 2020
Sep 8 2020
Sep 8 2020
zack triaged T2572: swh-scanner: add support for authentication token to lift rate-limit as Normal priority.
zack renamed T2300: swh-scanner: print a nicer error message when rate limit is hit from scanner: print a nicer error message when rate limit is hit to swh-scanner: print a nicer error message when rate limit is hit.
Jun 22 2020
Jun 22 2020
DanSeraf closed T2364: scanner: file browser in the sunburst/dashboard output as Resolved by committing rDTSCN0f10ec6ae8fe: dashboard: file visualization per directory path.
Apr 30 2020
Apr 30 2020
DanSeraf closed T2365: scanner: add color legend for sunburst output as Resolved by committing rDTSCNfb8ae03e494c: plot: color legend.
Apr 29 2020
Apr 29 2020
DanSeraf closed T2363: scanner: json output should return both known and unknown files/dirs as Resolved by committing rDTSCN623a9dbe6157: ndjson output format.
Apr 23 2020
Apr 23 2020
olasd added a comment to T2363: scanner: json output should return both known and unknown files/dirs.
zack added a comment to T2363: scanner: json output should return both known and unknown files/dirs.
In T2363#43710, @DanSeraf wrote:$ swh scanner scan -f json /tmp/test { "dir1": { "children": { "subdir1": { "children": { "text.txt": { "known": true, "swhid": "swh:1:cnt:ff5b57b7095eb5d168a36db6552ad2ce1f219bf6" }
Apr 22 2020
Apr 22 2020
DanSeraf added a comment to T2363: scanner: json output should return both known and unknown files/dirs.
The new json output will be like the following:
Apr 15 2020
Apr 15 2020
zack updated the task description for T2363: scanner: json output should return both known and unknown files/dirs.
Reopening, as it looks like this issue is back, e.g.:
$ swh scanner scan -f sunburst . Traceback (most recent call last): File "/home/zack/.virtualenvs/swh/bin/swh", line 11, in <module> load_entry_point('swh.core', 'console_scripts', 'swh')() File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-core/swh/core/cli/__init__.py", line 122, in main return swh(auto_envvar_prefix="SWH") File "/home/zack/.virtualenvs/swh/lib/python3.7/site-packages/click/core.py", line 764, in __call__ return self.main(*args, **kwargs) File "/home/zack/.virtualenvs/swh/lib/python3.7/site-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/home/zack/.virtualenvs/swh/lib/python3.7/site-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/zack/.virtualenvs/swh/lib/python3.7/site-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/zack/.virtualenvs/swh/lib/python3.7/site-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/zack/.virtualenvs/swh/lib/python3.7/site-packages/click/core.py", line 555, in invoke return callback(*args, **kwargs) File "/home/zack/.virtualenvs/swh/lib/python3.7/site-packages/click/decorators.py", line 17, in new_func return f(get_current_context(), *args, **kwargs) File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/cli.py", line 56, in scan loop.run_until_complete(run(path, api_url, source_tree)) File "/usr/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete return future.result() File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/scanner.py", line 138, in run await _scan(root, session, api_url, source_tree) File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/scanner.py", line 135, in _scan await _scan(path, session, api_url, source_tree) File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/scanner.py", line 135, in _scan await _scan(path, session, api_url, source_tree) File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/scanner.py", line 135, in _scan await _scan(path, session, api_url, source_tree) File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/scanner.py", line 125, in _scan for path, pid, found in await parse_path(root, session, api_url): File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/scanner.py", line 104, in parse_path parsed_pids = await pids_discovery(list(parsed_paths.values()), session, api_url) File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/scanner.py", line 61, in pids_discovery return await make_request(pids) File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/scanner.py", line 49, in make_request error_response(resp.reason, resp.status, endpoint) File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-scanner/swh/scanner/exceptions.py", line 18, in error_response raise APIError(error_msg) swh.scanner.exceptions.APIError: "429 Too Many Requests: 'https://archive.softwareheritage.org/api/1/known/'"
zack triaged T2363: scanner: json output should return both known and unknown files/dirs as Normal priority.
zack triaged T2362: scanner: aiohttp.client_exceptions.ServerDisconnectedError: None as High priority.
Mar 26 2020
Mar 26 2020
In T2336#42846, @DanSeraf wrote:The problem here is that actually swh-model il calculating the persistent identifier of the input path, so it also has to exclude the specified paths.
Mar 25 2020
Mar 25 2020
The problem here is that actually swh-model il calculating the persistent identifier of the input path, so it also has to exclude the specified paths.
Mar 11 2020
Mar 11 2020
DanSeraf closed T2300: swh-scanner: print a nicer error message when rate limit is hit as Resolved by committing rDTSCNcb5faf6e866a: exceptions: APIError response.
Mar 10 2020
Mar 10 2020
DanSeraf closed T2299: scanner: add integration tests as Resolved by committing rDTSCNa5555dd12464: basic test for the scanner.
Mar 6 2020
Mar 6 2020
Mar 3 2020
Mar 3 2020
DanSeraf closed T2298: scanner: support alternative output formats as Resolved by committing rDTSCNcedf6fc9b068: model: output result to json file.
Mar 2 2020
Mar 2 2020
zack renamed T2298: scanner: support alternative output formats from swh-scanner: more output formats to scanner: support alternative output formats.
Feb 28 2020
Feb 28 2020
zack triaged T2300: swh-scanner: print a nicer error message when rate limit is hit as Low priority.