Page MenuHomeSoftware Heritage
Feed Advanced Search

Mar 26 2021

DanSeraf closed T2570: swh-identify: support exclusion patterns (e.g., for .git/) as swh-scanner does as Resolved.

Already implemented in D4193

Mar 26 2021, 3:15 PM · Data Model
DanSeraf updated the diff for D5359: scanner: use 'extract_regex_objs' from swh.model.

removed unnecessary conversion

Mar 26 2021, 3:12 PM
DanSeraf added inline comments to D5359: scanner: use 'extract_regex_objs' from swh.model.
Mar 26 2021, 3:07 PM
DanSeraf added inline comments to D5359: scanner: use 'extract_regex_objs' from swh.model.
Mar 26 2021, 3:00 PM
DanSeraf updated the summary of D5359: scanner: use 'extract_regex_objs' from swh.model.
Mar 26 2021, 2:42 PM
DanSeraf requested review of D5359: scanner: use 'extract_regex_objs' from swh.model.
Mar 26 2021, 2:40 PM
DanSeraf added a revision to T2679: Use the `swh.model` version of `extract_regex_objs`: D5359: scanner: use 'extract_regex_objs' from swh.model.
Mar 26 2021, 2:39 PM · Code scanner

Feb 5 2021

DanSeraf committed rDTSCN8daa353de986: reimplement algo_min (authored by DanSeraf).
reimplement algo_min
Feb 5 2021, 5:51 PM
DanSeraf closed D5032: scanner-benchmark: improve logging information.
Feb 5 2021, 3:29 PM
DanSeraf committed rDTSCNe084e0f0f9ca: improve logging information (authored by DanSeraf).
improve logging information
Feb 5 2021, 3:29 PM
DanSeraf requested review of D5032: scanner-benchmark: improve logging information.
Feb 5 2021, 3:02 PM

Feb 4 2021

DanSeraf closed D5011: scanner-benchmark: use os.listdir() instead of os.walk() to avoid symlinks.
Feb 4 2021, 5:46 PM
DanSeraf committed rDTSCNe46e713d2145: run random algorithm only once (authored by DanSeraf).
run random algorithm only once
Feb 4 2021, 5:46 PM
DanSeraf committed rDTSCN3004b66787b2: use os.listdir() instead of os.walk() to avoid symlinks (authored by DanSeraf).
use os.listdir() instead of os.walk() to avoid symlinks
Feb 4 2021, 5:46 PM
DanSeraf updated the diff for D5011: scanner-benchmark: use os.listdir() instead of os.walk() to avoid symlinks.

rebase

Feb 4 2021, 5:38 PM
DanSeraf updated the diff for D5011: scanner-benchmark: use os.listdir() instead of os.walk() to avoid symlinks.

rebase

Feb 4 2021, 5:08 PM
DanSeraf updated the diff for D5011: scanner-benchmark: use os.listdir() instead of os.walk() to avoid symlinks.

reword commit

Feb 4 2021, 4:51 PM
DanSeraf updated the diff for D5011: scanner-benchmark: use os.listdir() instead of os.walk() to avoid symlinks.

exclude symlink

Feb 4 2021, 4:31 PM
DanSeraf requested review of D5011: scanner-benchmark: use os.listdir() instead of os.walk() to avoid symlinks.
Feb 4 2021, 2:31 PM

Feb 2 2021

DanSeraf closed D4996: algo_min: delete the upstream directories if a (sub)directory is unknown.
Feb 2 2021, 3:54 PM
DanSeraf committed rDTSCN3d3665a4f5bb: algo_min: delete the upstream directories if a (sub)directory is unknown (authored by DanSeraf).
algo_min: delete the upstream directories if a (sub)directory is unknown
Feb 2 2021, 3:54 PM
DanSeraf requested review of D4996: algo_min: delete the upstream directories if a (sub)directory is unknown.
Feb 2 2021, 3:39 PM
DanSeraf committed rDTSCNc42e643aa512: model: wrong iteration in 'iterate_bfs' function (authored by DanSeraf).
model: wrong iteration in 'iterate_bfs' function
Feb 2 2021, 2:48 PM
DanSeraf committed rDTSCN0d3b5cb86144: make 'set_children_status' works with different kind of nodes (authored by DanSeraf).
make 'set_children_status' works with different kind of nodes
Feb 2 2021, 2:48 PM
DanSeraf closed D4993: scanner-benchmark: some code improvement.
Feb 2 2021, 2:48 PM
DanSeraf committed rDTSCNb601f382db64: file_priority: remove children only when the unset directory is known (authored by DanSeraf).
file_priority: remove children only when the unset directory is known
Feb 2 2021, 2:48 PM
DanSeraf requested review of D4993: scanner-benchmark: some code improvement.
Feb 2 2021, 1:31 PM

Jan 28 2021

DanSeraf committed rDTSCN5e01c09af4c6: retry request in case of backend failure (authored by DanSeraf).
retry request in case of backend failure
Jan 28 2021, 11:44 AM
DanSeraf closed D4947: scanner-benchmark: algo_min fixed, retry mechanism on request error.
Jan 28 2021, 11:44 AM
DanSeraf committed rDTSCNebad16c02da6: algo_min: remove the current node as well (authored by DanSeraf).
algo_min: remove the current node as well
Jan 28 2021, 11:44 AM

Jan 26 2021

DanSeraf updated the diff for D4947: scanner-benchmark: algo_min fixed, retry mechanism on request error.

commit message

Jan 26 2021, 3:45 PM
DanSeraf retitled D4947: scanner-benchmark: algo_min fixed, retry mechanism on request error from scanner-benchmark: algo_miv fixed, retry mechanism on request error to scanner-benchmark: algo_min fixed, retry mechanism on request error.
Jan 26 2021, 3:15 PM
DanSeraf retitled D4947: scanner-benchmark: algo_min fixed, retry mechanism on request error from scanner-benchmark: algo_mix fixed, retry mechanism on request error to scanner-benchmark: algo_miv fixed, retry mechanism on request error.
Jan 26 2021, 3:15 PM
DanSeraf requested review of D4947: scanner-benchmark: algo_min fixed, retry mechanism on request error.
Jan 26 2021, 12:43 PM

Jan 18 2021

DanSeraf closed D4875: scanner-benchmark: the temporary directory is removed by tempdir.
Jan 18 2021, 12:32 PM
DanSeraf committed rDTSCN5cd9f762467e: fix: the temporary directory is removed by tempfile (authored by DanSeraf).
fix: the temporary directory is removed by tempfile
Jan 18 2021, 12:32 PM
DanSeraf requested review of D4875: scanner-benchmark: the temporary directory is removed by tempdir.
Jan 18 2021, 10:35 AM

Jan 13 2021

DanSeraf committed rDTSCN7a289332f730: print results as a csv (authored by DanSeraf).
print results as a csv
Jan 13 2021, 7:46 PM
DanSeraf closed D4851: scanner benchmark: output format and repository extraction in temporary directories.
Jan 13 2021, 7:45 PM
DanSeraf committed rDTSCN9e4df16d9486: extract repositories in temporary directories (authored by DanSeraf).
extract repositories in temporary directories
Jan 13 2021, 7:45 PM
DanSeraf requested review of D4851: scanner benchmark: output format and repository extraction in temporary directories.
Jan 13 2021, 11:20 AM

Dec 19 2020

DanSeraf committed rDTSCN7bd1939949dc: scanner experiments (authored by DanSeraf).
scanner experiments
Dec 19 2020, 4:46 PM
DanSeraf closed D4721: WIP: scanner benchmark.
Dec 19 2020, 4:46 PM
DanSeraf updated the diff for D4721: WIP: scanner benchmark.

wrong algorithm name in example

Dec 19 2020, 4:41 PM

Dec 17 2020

DanSeraf updated the diff for D4721: WIP: scanner benchmark.

variable name in run_benchmark.sh

Dec 17 2020, 2:12 PM
DanSeraf updated the diff for D4721: WIP: scanner benchmark.

remove git missing imports in mypy.ini

Dec 17 2020, 2:05 PM
DanSeraf updated the diff for D4721: WIP: scanner benchmark.

requested changes
+ algorithms can be specified from run_benchmark.sh
+ if "random" algorithm is specified, benchmark.py will run three experiments using the default seeds (10, 20, 30)

Dec 17 2020, 7:51 AM

Dec 11 2020

DanSeraf created D4721: WIP: scanner benchmark.
Dec 11 2020, 12:46 PM

Nov 24 2020

DanSeraf closed T2760: swh-scanner: add support for local DB of known SWHIDs as Resolved.
Nov 24 2020, 1:54 PM · Code scanner
DanSeraf committed rDTSCN09c28d60f1ad: 'db serve' option to start the API service (authored by DanSeraf).
'db serve' option to start the API service
Nov 24 2020, 1:51 PM
DanSeraf closed D4552: 'db serve' option to start the API service.
Nov 24 2020, 1:51 PM
DanSeraf updated the diff for D4552: 'db serve' option to start the API service.

Minor changes

Nov 24 2020, 1:49 PM
DanSeraf updated the diff for D4552: 'db serve' option to start the API service.

rebase

Nov 24 2020, 11:48 AM
DanSeraf updated the diff for D4552: 'db serve' option to start the API service.
  • changed module name
  • query the database with only one cursor
  • get SWHID known status directly when generating the response
Nov 24 2020, 11:09 AM

Nov 23 2020

DanSeraf added inline comments to D4552: 'db serve' option to start the API service.
Nov 23 2020, 5:01 PM

Nov 22 2020

DanSeraf added a revision to T2760: swh-scanner: add support for local DB of known SWHIDs: D4552: 'db serve' option to start the API service.
Nov 22 2020, 4:19 PM · Code scanner
DanSeraf created D4552: 'db serve' option to start the API service.
Nov 22 2020, 4:19 PM

Nov 21 2020

DanSeraf committed rDTSCN521420e7d5eb: 'db import' option to create local database with known swhids (authored by DanSeraf).
'db import' option to create local database with known swhids
Nov 21 2020, 3:01 PM
DanSeraf closed D4508: scanner: 'db import' option to create local database with known swhids.
Nov 21 2020, 3:01 PM

Nov 20 2020

DanSeraf updated the diff for D4508: scanner: 'db import' option to create local database with known swhids.

minor changes:

  • mypy annotation
  • tests
Nov 20 2020, 10:03 AM

Nov 19 2020

DanSeraf updated the diff for D4508: scanner: 'db import' option to create local database with known swhids.

minor changes

Nov 19 2020, 5:08 PM
DanSeraf updated the diff for D4508: scanner: 'db import' option to create local database with known swhids.

requested changes:

  • SWHID as PRIMARY KEY in db
  • SWHID insertion without query the Web API
  • bulk insert of SWHID values in db
Nov 19 2020, 4:04 PM

Nov 18 2020

DanSeraf updated the diff for D4508: scanner: 'db import' option to create local database with known swhids.

requested changes

Nov 18 2020, 5:43 PM
DanSeraf added a revision to T2760: swh-scanner: add support for local DB of known SWHIDs: D4508: scanner: 'db import' option to create local database with known swhids.
Nov 18 2020, 2:24 PM · Code scanner
DanSeraf created D4508: scanner: 'db import' option to create local database with known swhids.
Nov 18 2020, 2:24 PM

Nov 16 2020

DanSeraf changed the status of T2760: swh-scanner: add support for local DB of known SWHIDs from Open to Work in Progress.
Nov 16 2020, 10:41 AM · Code scanner

Oct 13 2020

DanSeraf triaged T2692: Move the output related functions to another (sub)module as Normal priority.
Oct 13 2020, 9:57 AM · Code scanner
DanSeraf committed rDTSCNc2768d171a78: model: dropped _iter_nodes_attr function (authored by DanSeraf).
model: dropped _iter_nodes_attr function
Oct 13 2020, 9:36 AM
DanSeraf closed T2690: swh scanner reports double results in ndjson format as Resolved by committing rDTSCNc2768d171a78: model: dropped _iter_nodes_attr function.
Oct 13 2020, 9:36 AM · Code scanner
DanSeraf closed D4241: scanner: removed _iter_nodes_attr function in model (causes results duplication).
Oct 13 2020, 9:35 AM
DanSeraf created D4241: scanner: removed _iter_nodes_attr function in model (causes results duplication).
Oct 13 2020, 8:54 AM

Sep 11 2020

DanSeraf committed rDTSCNf838fed672d8: cli: don't check for glob pattern before translating into regex (authored by DanSeraf).
cli: don't check for glob pattern before translating into regex
Sep 11 2020, 9:20 AM
DanSeraf closed D3924: cli: don't check for glob pattern before translating into regex.
Sep 11 2020, 9:20 AM
DanSeraf updated the diff for D3924: cli: don't check for glob pattern before translating into regex.

tests

Sep 11 2020, 9:16 AM

Sep 10 2020

DanSeraf created D3924: cli: don't check for glob pattern before translating into regex.
Sep 10 2020, 6:05 PM
DanSeraf committed rDTSCN2fb9cb1c59e2: docs: readme and cli description update (authored by DanSeraf).
docs: readme and cli description update
Sep 10 2020, 5:23 PM
DanSeraf closed D3876: readme and cli description update.
Sep 10 2020, 5:23 PM
DanSeraf updated the diff for D3876: readme and cli description update.

rebase

Sep 10 2020, 5:22 PM
DanSeraf updated the diff for D3876: readme and cli description update.

changes

Sep 10 2020, 4:44 PM
DanSeraf added inline comments to D3876: readme and cli description update.
Sep 10 2020, 4:41 PM
DanSeraf accepted D3919: cli: speedup the `swh` cli command startup time.
Sep 10 2020, 4:39 PM

Sep 9 2020

DanSeraf updated the diff for D3876: readme and cli description update.

requested changes

Sep 9 2020, 5:42 PM

Sep 8 2020

DanSeraf committed rDDOCccb3629bc484: index: add swh.scanner (authored by DanSeraf).
index: add swh.scanner
Sep 8 2020, 1:52 PM
DanSeraf closed D3886: index: add swh.scanner.
Sep 8 2020, 1:52 PM
DanSeraf created D3886: index: add swh.scanner.
Sep 8 2020, 1:47 PM

Sep 3 2020

DanSeraf added inline comments to D3876: readme and cli description update.
Sep 3 2020, 5:22 PM
DanSeraf created D3876: readme and cli description update.
Sep 3 2020, 4:01 PM

Jul 3 2020

DanSeraf created P712 (An Untitled Masterwork).
Jul 3 2020, 12:43 PM

Jun 22 2020

DanSeraf committed rDTSCN0f10ec6ae8fe: dashboard: file visualization per directory path (authored by DanSeraf).
dashboard: file visualization per directory path
Jun 22 2020, 7:39 PM
DanSeraf closed T2364: scanner: file browser in the sunburst/dashboard output as Resolved by committing rDTSCN0f10ec6ae8fe: dashboard: file visualization per directory path.
Jun 22 2020, 7:39 PM · Code scanner
DanSeraf closed D3293: scanner: dashboard file visualization per directory path.
Jun 22 2020, 7:39 PM
DanSeraf closed T2336: scanner: add support for an exclusion list as Resolved.
Jun 22 2020, 2:57 PM · Code scanner
DanSeraf updated the diff for D3293: scanner: dashboard file visualization per directory path.
  • non-minified css in assets
Jun 22 2020, 2:50 PM

Jun 20 2020

DanSeraf updated the diff for D3293: scanner: dashboard file visualization per directory path.
  • workaround to check the table body
  • bootstrap css update
Jun 20 2020, 4:37 PM
DanSeraf added a comment to D3293: scanner: dashboard file visualization per directory path.

Unfortunately the expected values can't be tested.

Why not?

Because the dash_html_components checks for object identity only. I wrote a comment inside the test.

Jun 20 2020, 4:37 PM

Jun 19 2020

DanSeraf added a comment to D3293: scanner: dashboard file visualization per directory path.
  • It's missing tests. I understand it's not easy to do for a GUI, but could you see what you can do about it?
Jun 19 2020, 4:54 PM
DanSeraf updated the diff for D3293: scanner: dashboard file visualization per directory path.
  • init.py in dashboard directory
Jun 19 2020, 4:51 PM
DanSeraf updated the diff for D3293: scanner: dashboard file visualization per directory path.
  • css as static asset
Jun 19 2020, 4:41 PM

Jun 16 2020

DanSeraf updated the summary of D3293: scanner: dashboard file visualization per directory path.
Jun 16 2020, 6:45 PM
DanSeraf created D3293: scanner: dashboard file visualization per directory path.
Jun 16 2020, 6:31 PM