Page MenuHomeSoftware Heritage
Feed Advanced Search

Jan 9 2021

zack added a comment to T2946: pre-commit is required by bin/update.

Thanks for the bug report. Do you have a log of what happens upon bin/update if pre-commit is missing? (it's sure is needed for committing, but i'm surprised it's needed for bin/update).
Also, is it needed for a raw "bin/update" or only for "bin/update -i" (that also install Python dependencies?).

Jan 9 2021, 5:30 PM · Easy hack, Documentation
zack added a comment to D4821: Add LLP compression to the WebGraph pipeline.

oh, also, this diff should close T2647 in its commit message

Jan 9 2021, 4:06 PM
zack added a revision to T2647: add LLP support to graph compression pipeline: D4821: Add LLP compression to the WebGraph pipeline.
Jan 9 2021, 4:06 PM · Compressed graph service
zack added a task to D4821: Add LLP compression to the WebGraph pipeline: T2647: add LLP support to graph compression pipeline.
Jan 9 2021, 4:06 PM

Jan 8 2021

zack requested changes to D4821: Add LLP compression to the WebGraph pipeline.
Jan 8 2021, 4:09 PM

Jan 7 2021

zack added a revision to T2595: Add a default configuration based on graph size (eg: batch_size): D4820: config: sane default for batch_size using a heuristic on ram size.
Jan 7 2021, 10:34 PM · Compressed graph service
zack added a task to D4820: config: sane default for batch_size using a heuristic on ram size: T2595: Add a default configuration based on graph size (eg: batch_size).
Jan 7 2021, 10:34 PM
zack accepted D4820: config: sane default for batch_size using a heuristic on ram size.

note that this diff should close T2595, so the commit message should be updated

Jan 7 2021, 10:33 PM
zack added a comment to T2942: Update deposit: `swhid` MUST exist in archive for a metadata-only deposit.

My answer is a very emphatic YES.

Jan 7 2021, 3:10 PM · SWORD deposit

Jan 6 2021

zack renamed T2933: shared pg_service configuration on all hosts from shared pg_service configuration of all hosts to shared pg_service configuration on all hosts.
Jan 6 2021, 2:11 PM · System administration
zack updated the task description for T2933: shared pg_service configuration on all hosts.
Jan 6 2021, 1:39 PM · System administration
zack triaged T2933: shared pg_service configuration on all hosts as Normal priority.
Jan 6 2021, 1:39 PM · System administration
zack changed the visibility for F4241580: swh-download-screenshot.png.
Jan 6 2021, 11:15 AM
zack triaged T2930: vault: use SWHIDs as identifiers shown to user in the download window as Low priority.
Jan 6 2021, 11:14 AM · Easy hack, Web app

Jan 5 2021

zack added a comment to T2912: Next generation archive counters.
In T2912#55849, @olasd wrote:

I think we should be able to decouple these counters completely from the loaders, and have them directly updated/handled by a client of the swh-journal. This would be a "centralized" component, but which we can parallelize quite heavily thanks to basic kafka design. We can also leverage the way kafka clients do parallelism to sidestep the locking issues arising in a potentially distributed filter.

Maybe my writing was not all that clear: I also had in mind a single centralised component (the ArchiveCounter) per Bloom filter, receiving the lists newcontents of ids from the loaders.
Getting the feed of ids from swh-journal instead of from the loaders is really neat: we avoid touching the loader code, and we gain a better capability of monitoring the load on the ArchiveCounter, so I'm all for it :-)

Jan 5 2021, 6:01 PM · Roadmap 2021, System administration, Monitoring, Web app

Jan 4 2021

zack accepted D4801: FUSE: cache: share sqlite connection between metadata/history cache.
Jan 4 2021, 6:55 PM
zack requested changes to D4801: FUSE: cache: share sqlite connection between metadata/history cache.

I don't like this "asymmetric" approach much. I'd rather have both caches being equal and taking as class constructor input an optional sqlite connection. If it's None, they'll each create a connection (and not share it). If a connection is passed, they use it and do not close it on exit. Then, the init code in FuseCache takes care of initializing the shared sqlite connection, and passing it to both constructors. This will make the classes more reusable and the init code more clean.

Jan 4 2021, 10:49 AM
zack accepted D4797: FUSE: proof-read documentation.
Jan 4 2021, 9:42 AM
zack triaged T2918: missing "Authentication" link in navigation header as Low priority.
Jan 4 2021, 9:26 AM · Easy hack, Web app

Dec 30 2020

zack closed T2788: deduplicate validation logic between parse_swhid() and SWHID class constructor as Resolved by committing rDMOD574685052348: SWHID parsing: simplify and deduplicate validation logic.
Dec 30 2020, 3:31 PM · Data Model
zack closed D4771: SWHID parsing: simplify and deduplicate validation logic.
Dec 30 2020, 3:31 PM
zack committed rDMOD574685052348: SWHID parsing: simplify and deduplicate validation logic (authored by zack).
SWHID parsing: simplify and deduplicate validation logic
Dec 30 2020, 3:31 PM
zack updated the diff for D4771: SWHID parsing: simplify and deduplicate validation logic.

address review comments

Dec 30 2020, 1:23 PM
zack requested changes to D4797: FUSE: proof-read documentation.
Dec 30 2020, 8:57 AM

Dec 29 2020

zack accepted D4796: Rephrase Python version requirement and make it stand out more..
Dec 29 2020, 9:10 AM

Dec 28 2020

zack accepted D4795: FUSE: cache: add support to remove individual objects.
Dec 28 2020, 7:49 PM
zack committed rDFUSEae3a0e9760d3: requirements: typeset dependencies with spaces around ">=" (authored by zack).
requirements: typeset dependencies with spaces around ">="
Dec 28 2020, 6:17 PM
zack added a comment to T2917: document minimum required Python version.
In T2917#55617, @zvr wrote:

Please, not only document but add to the code something like
python_requires='>=3.7'
to setup.py so that the Pypi packages are not installable in older versions.

Dec 28 2020, 6:14 PM · Easy hack, Documentation
zack requested changes to D4795: FUSE: cache: add support to remove individual objects.
Dec 28 2020, 4:09 PM
zack added a project to T2917: document minimum required Python version: Easy hack.
Dec 28 2020, 3:55 PM · Easy hack, Documentation
zack triaged T2917: document minimum required Python version as High priority.
Dec 28 2020, 3:55 PM · Easy hack, Documentation
zack renamed T2917: document minimum required Python version from Is Python 3.6 the minimum version supported? to document minimum required Python version.
Dec 28 2020, 3:54 PM · Easy hack, Documentation
zack added a project to T2917: document minimum required Python version: Documentation.

Yeah, we really depend on recent-ish Python (and not only for f-strings, a lot of typing-related stuff also comes to mind).
3.5 is no longer supported (and I have my doubts about 3.6 too; we generally follow Debian stable, which is on Python 3.7 now).

Dec 28 2020, 3:54 PM · Easy hack, Documentation
zack accepted D4793: FUSE: docs: tutorial: add origin and search examples.
Dec 28 2020, 12:41 PM
zack added inline comments to D4793: FUSE: docs: tutorial: add origin and search examples.
Dec 28 2020, 11:57 AM
zack requested changes to D4793: FUSE: docs: tutorial: add origin and search examples.
Dec 28 2020, 11:53 AM
zack accepted D4792: FUSE: fs: add top-level README file to explain briefly SwhFS.

conditionally accepted with very minor changes to be implemented before landing

Dec 28 2020, 10:21 AM
zack placed T2825: add origin (and search) example to the FUSE tutorial up for grabs.
Dec 28 2020, 9:11 AM · Documentation, Software Heritage filesystem

Dec 23 2020

zack edited P911 Command-Line Input.
Dec 23 2020, 11:18 PM
zack created P911 Command-Line Input.
Dec 23 2020, 11:17 PM
zack created P910 Command-Line Input.
Dec 23 2020, 6:49 PM
zack triaged T2914: mirror documentation: add ballpark storage/infra requirements as Normal priority.
Dec 23 2020, 1:55 PM · Mirror, Documentation
zack committed rDDOCc4c2d2b04acd: mirror doc: full reread with improvements here and there (authored by zack).
mirror doc: full reread with improvements here and there
Dec 23 2020, 1:45 PM
zack committed rDDOC05a7ad9b1be7: doc index: improve short descriptions of entry points (authored by zack).
doc index: improve short descriptions of entry points
Dec 23 2020, 1:45 PM
zack committed rDDOCcd397cee65cb: Add a quick presentation/architecture doc on the mirror (authored by douardda).
Add a quick presentation/architecture doc on the mirror
Dec 23 2020, 1:45 PM
zack closed D4769: Add a quick presentation/architecture doc on the mirror.
Dec 23 2020, 1:45 PM
zack updated the diff for D4769: Add a quick presentation/architecture doc on the mirror.
  • doc index: improve short descriptions of entry points
  • mirror doc: full reread with improvements here and there
Dec 23 2020, 1:45 PM
zack updated the diff for D4769: Add a quick presentation/architecture doc on the mirror.
  • mirror doc: full reread with improvements here and there
Dec 23 2020, 1:44 PM
zack added inline comments to D4769: Add a quick presentation/architecture doc on the mirror.
Dec 23 2020, 11:59 AM
zack updated the diff for D4769: Add a quick presentation/architecture doc on the mirror.
  • mirror doc: merge all topics in a single list
Dec 23 2020, 11:59 AM
zack added inline comments to D4769: Add a quick presentation/architecture doc on the mirror.
Dec 23 2020, 11:20 AM
zack removed reviewers for D4769: Add a quick presentation/architecture doc on the mirror: vlorentz, douardda.
Dec 23 2020, 11:19 AM
zack updated the diff for D4769: Add a quick presentation/architecture doc on the mirror.
  • doc index: improve short descriptions of entry points
  • mirror doc: full reread with improvements here and there
Dec 23 2020, 11:19 AM
zack commandeered D4769: Add a quick presentation/architecture doc on the mirror.
Dec 23 2020, 11:18 AM

Dec 22 2020

zack accepted D4765: Make tests pass as soon as the setup.py is initialized..
Dec 22 2020, 6:04 PM
zack accepted D4780: Add more comments to elasticsearch.py.
Dec 22 2020, 5:16 PM
zack retitled D4780: Add more comments to elasticsearch.py from Add more comments in elasticsearch.py to Add more comments to elasticsearch.py.
Dec 22 2020, 5:15 PM
zack accepted D4779: Reference swh-search documentation.
Dec 22 2020, 5:15 PM
zack accepted D4778: Write some basic documentation to describe what swh-search is..
Dec 22 2020, 5:13 PM
zack accepted D4777: Fix syntax highlighting of docs/journal.rst..
Dec 22 2020, 5:11 PM
zack created P909 Command-Line Input.
Dec 22 2020, 3:46 PM

Dec 19 2020

zack placed T2300: swh-scanner: print a nicer error message when rate limit is hit up for grabs.
Dec 19 2020, 9:48 PM · Easy hack, Code scanner
zack committed rDTSCN33a9cd4eb965: DB import: skip invalid SWHIDs during import (authored by zack).
DB import: skip invalid SWHIDs during import
Dec 19 2020, 9:47 PM
zack closed T2813: swh scanner db import does not validate SWHIDs as Resolved by committing rDTSCN33a9cd4eb965: DB import: skip invalid SWHIDs during import.
Dec 19 2020, 9:47 PM · Code scanner
zack committed rDTSCNfe84403087cc: DB import: massive speed up, via sqlite tuning and better mem handling (authored by zack).
DB import: massive speed up, via sqlite tuning and better mem handling
Dec 19 2020, 9:47 PM
zack closed T2812: scanner import db is slow, improve its performances as Resolved by committing rDTSCNfe84403087cc: DB import: massive speed up, via sqlite tuning and better mem handling.
Dec 19 2020, 9:47 PM · Code scanner
zack closed T2836: swh scanner db import loads keeps all input SWHIDs in memory as Resolved by committing rDTSCNfe84403087cc: DB import: massive speed up, via sqlite tuning and better mem handling.
Dec 19 2020, 9:47 PM · Easy hack, Code scanner
zack committed rDTSCNd929992035da: CLI: improve exit handling upon failure (authored by zack).
CLI: improve exit handling upon failure
Dec 19 2020, 9:47 PM
zack added a revision to T2788: deduplicate validation logic between parse_swhid() and SWHID class constructor: D4771: SWHID parsing: simplify and deduplicate validation logic.
Dec 19 2020, 1:51 PM · Data Model
zack created D4771: SWHID parsing: simplify and deduplicate validation logic.
Dec 19 2020, 1:51 PM

Dec 18 2020

zack accepted D4763: FUSE: fs: history/by-date: update .status count on read.
Dec 18 2020, 5:10 PM
zack requested changes to D4763: FUSE: fs: history/by-date: update .status count on read.
Dec 18 2020, 10:04 AM
zack added a comment to D4763: FUSE: fs: history/by-date: update .status count on read.

LGTM in general, I've just nitpicked about a couple of things.

Dec 18 2020, 10:04 AM
zack accepted D4721: WIP: scanner benchmark.
Dec 18 2020, 8:38 AM
zack accepted D4762: Add ORC exporter.
Dec 18 2020, 8:34 AM

Dec 17 2020

zack added a project to T2900: Public graph/ API does not handle streaming results from endpoints: System administration.
Dec 17 2020, 4:15 PM · System administration, Compressed graph service, Web app
zack added a project to T2900: Public graph/ API does not handle streaming results from endpoints: Compressed graph service.
Dec 17 2020, 4:15 PM · System administration, Compressed graph service, Web app
zack accepted D4761: docs: index: update shell examples with recent CLI changes.
Dec 17 2020, 3:16 PM

Dec 16 2020

zack accepted D4744: FUSE: cache: update cache with new origin visits.
Dec 16 2020, 4:19 PM
zack requested changes to D4744: FUSE: cache: update cache with new origin visits.
Dec 16 2020, 12:12 PM
zack accepted D4752: FUSE: cli: raise error when missing fusermount3 dependency.

conditionally accepted, please generalize the check before landing (see comment)

Dec 16 2020, 11:56 AM

Dec 15 2020

zack updated the task description for T2812: scanner import db is slow, improve its performances.
Dec 15 2020, 5:57 PM · Code scanner
zack updated the task description for T2812: scanner import db is slow, improve its performances.
Dec 15 2020, 5:50 PM · Code scanner
zack renamed T2812: scanner import db is slow, improve its performances from scanner: improve SWHID (txt) -> sqlite import time to scanner import db is slow, improve its performances.
Dec 15 2020, 5:48 PM · Code scanner
zack renamed T2724: fuse: populate history/ incrementally to avoid temporary hangs on huge repos from history/ dir browsing is too slow on big repos like the Linux kernel to populate history/ incrementally to avoid temporary hangs on huge repos.
Dec 15 2020, 11:58 AM · Software Heritage filesystem
zack accepted D4740: cli: add --config-file option to 'swh web' group.

LGTM (but please fix the minor docstring point I've mentioned before landing)

Dec 15 2020, 11:57 AM
zack claimed T2825: add origin (and search) example to the FUSE tutorial.
Dec 15 2020, 11:26 AM · Documentation, Software Heritage filesystem

Dec 14 2020

zack accepted D4734: fs: add cache/ directory under mount point.
Dec 14 2020, 6:01 PM
zack retitled D4735: Move parse_swh_reference to swh.deposit.utils namespace from Move parse_swh_reference in swh.deposit.utils namespace to Move parse_swh_reference to swh.deposit.utils namespace.
Dec 14 2020, 5:46 PM
zack requested changes to D4734: fs: add cache/ directory under mount point.
Dec 14 2020, 5:05 PM
zack retitled D4734: fs: add cache/ directory under mount point from fs: add cache/ directory under mounting point to fs: add cache/ directory under mount point.
Dec 14 2020, 3:46 PM
zack requested changes to D4721: WIP: scanner benchmark.
Dec 14 2020, 3:03 PM
zack renamed T2825: add origin (and search) example to the FUSE tutorial from add origin example to the FUSE tutorial to add origin (and search) example to the FUSE tutorial.
Dec 14 2020, 1:23 PM · Documentation, Software Heritage filesystem
zack removed a reviewer for D4696: Add "swh web search" command to perform archive searches via the CLI: anlambert.
Dec 14 2020, 1:20 PM
zack accepted D4696: Add "swh web search" command to perform archive searches via the CLI.
Dec 14 2020, 1:16 PM

Dec 12 2020

zack renamed T2883: Archive mirror in Spain from Spain to Archive mirror in Spain.
Dec 12 2020, 9:15 AM · Unknown Object (Project)
zack renamed T2881: Archive mirror at FossID from FossID to Archive mirror at FossID.
Dec 12 2020, 9:15 AM · Unknown Object (Project)
zack renamed T2882: Archive mirror at ENEA from ENEA to Archive mirror at ENEA.
Dec 12 2020, 9:15 AM · Unknown Object (Project)

Dec 11 2020

zack renamed T2793: add notable past events to the archive changelog from add to the journal of archive changes notable past events to add notable past events to the archive changelog.
Dec 11 2020, 4:24 PM · Archive coverage, Documentation
zack committed rDDOCde792f8fd73a: archive changelog: remove WIP mark (authored by zack).
archive changelog: remove WIP mark
Dec 11 2020, 4:23 PM