Page MenuHomeSoftware Heritage

zack (Stefano Zacchiroli)
UserAdministrator

User Details

User Since
Sep 7 2015, 3:43 PM (286 w, 2 d)
Roles
Administrator

Recent Activity

Yesterday

zack added a project to T1968: existing graph endpoints should not return 404 upon missing arguments: Easy hack.
Wed, Mar 3, 9:51 AM · Easy hack, Graph service
zack updated the task description for T2981: Graph API: add a (node type) result filters.
Wed, Mar 3, 9:50 AM · Graph service
zack added a project to T2730: scanner: should output the root SWHID as well: Easy hack.
Wed, Mar 3, 9:49 AM · Easy hack, Code scanner
zack added a project to T2731: scanner: strip the path passed as argument from output: Easy hack.
Wed, Mar 3, 9:49 AM · Easy hack, Code scanner

Mon, Mar 1

zack committed rDFUSE9bac54374843: migrate from SWHID to new CoreSWHID class (authored by zack).
migrate from SWHID to new CoreSWHID class
Mon, Mar 1, 5:41 PM
zack added a member for Interns: Hakimb.
Mon, Mar 1, 11:31 AM
zack updated the task description for T3074: Migrate all packages away from the old SWHID class.
Mon, Mar 1, 9:59 AM · Data Model

Sat, Feb 27

zack committed rDDOC84815864acb9: developer setup: uniform rendering of pip install (minor) (authored by zack).
developer setup: uniform rendering of pip install (minor)
Sat, Feb 27, 9:08 AM

Fri, Feb 26

zack updated the task description for T3074: Migrate all packages away from the old SWHID class.
Fri, Feb 26, 7:08 PM · Data Model
zack closed D5161: typing: make recent mypy (0.812) pass again.
Fri, Feb 26, 6:40 PM
zack committed rDFUSE57d375a3ffee: typing: make recent mypy (0.812) pass again (authored by zack).
typing: make recent mypy (0.812) pass again
Fri, Feb 26, 6:40 PM
zack added inline comments to D5161: typing: make recent mypy (0.812) pass again.
Fri, Feb 26, 6:15 PM
zack added a reviewer for D5161: typing: make recent mypy (0.812) pass again: seirl.
Fri, Feb 26, 5:21 PM
zack requested review of D5161: typing: make recent mypy (0.812) pass again.
Fri, Feb 26, 5:18 PM
zack closed D5155: developer setup: simplify package installation by delegating to bin/install.
Fri, Feb 26, 5:18 PM
zack committed rDDOC42e770e62ee0: developer setup: simplify package installation by delegating to bin/install (authored by zack).
developer setup: simplify package installation by delegating to bin/install
Fri, Feb 26, 5:18 PM
zack committed rDENV4f3600624565: bin/pip-swh-packages: remove [extras] markers before topo-sorting (authored by zack).
bin/pip-swh-packages: remove [extras] markers before topo-sorting
Fri, Feb 26, 5:17 PM
zack closed D5154: bin/install: install packages one by one and enforce prerequisites.
Fri, Feb 26, 5:17 PM
zack committed rDENVe6fbb0f3bccf: bin/install: install packages one by one and enforce prerequisites (authored by zack).
bin/install: install packages one by one and enforce prerequisites
Fri, Feb 26, 5:17 PM
zack updated the task description for T3054: Scale out object storage design.
Fri, Feb 26, 3:29 PM · Object storage
zack updated the task description for T3054: Scale out object storage design.
Fri, Feb 26, 3:24 PM · Object storage
zack updated the task description for T3054: Scale out object storage design.
Fri, Feb 26, 3:19 PM · Object storage
zack added inline comments to D5154: bin/install: install packages one by one and enforce prerequisites.
Fri, Feb 26, 2:49 PM
zack committed rDFUSEfa32c040e229: docs: add link to screencast and ICSE 2021 paper (authored by zack).
docs: add link to screencast and ICSE 2021 paper
Fri, Feb 26, 2:30 PM
zack requested review of D5155: developer setup: simplify package installation by delegating to bin/install.
Fri, Feb 26, 2:27 PM
zack requested review of D5154: bin/install: install packages one by one and enforce prerequisites.
Fri, Feb 26, 2:26 PM
zack committed rDGRPH21de1e134b4b: docs: link to official SANER 2020 paper in the proceedings (authored by zack).
docs: link to official SANER 2020 paper in the proceedings
Fri, Feb 26, 1:40 PM

Wed, Feb 24

zack added a comment to T3059: CODEMETA.json files do not get indexed.

@KShivendu it sounds like you haven't installed the various swh-* modules with pip -e. Did you follow the https://docs.softwareheritage.org/devel/developer-setup.html#developer-setup ? Steps in there should take care of that.

Wed, Feb 24, 8:16 AM · Easy hack, Indexer

Fri, Feb 19

zack added inline comments to D5118: Add new class ExtendedSWHID as an alternative to SWHID/QualifiedSWHID.
Fri, Feb 19, 4:56 PM

Thu, Feb 18

zack added a comment to D5107: Add new class CoreSWHID as an alternative to SWHID/QualifiedSWHID.

@zack added to D5081: https://forge.softwareheritage.org/D5081?vs=18237&id=18239#toc

if that's fine with you, I'll do the same here

Thu, Feb 18, 2:06 PM
zack added a comment to D5107: Add new class CoreSWHID as an alternative to SWHID/QualifiedSWHID.

I've only added a nitpicky comment about docstrings.

Thu, Feb 18, 1:24 PM

Tue, Feb 16

zack updated subscribers of T3054: Scale out object storage design.

Thanks for this summary/status, very useful. Regarding goals, I think we want to have a read goal also about time to first bite, which is a performance metric which is particularly bad in the current filesystem-based object storage. Not sure what would be a reasonable goal though. Poke @olasd: any idea about a good target for this?

Tue, Feb 16, 7:52 PM · Object storage
zack resigned from D5081: Add new class QualifiedSWHID to replace SWHID, and deprecate the latter..

I'm not approving (and only resigning as reviewer instead) just to allow others to have a look, given it's something quite foundational in the data model.

Tue, Feb 16, 3:07 PM
zack requested changes to D5081: Add new class QualifiedSWHID to replace SWHID, and deprecate the latter..
Tue, Feb 16, 2:26 PM

Mon, Feb 15

zack updated the task description for T3049: Benchmarking an RBD based object container.
Mon, Feb 15, 7:54 PM · Object storage
zack added a comment to T3048: Using a custom Sorted String Table format.

followed sequence of:

Size of SHA256, SWHID, Content
SHA256
SWHID
Content

Mon, Feb 15, 7:53 PM · Object storage
zack added a comment to T3049: Benchmarking an RBD based object container.

The object storage is a collection of RBD images containing a sequence of objects (SHA256 + SWHID + content).

Mon, Feb 15, 7:50 PM · Object storage
zack added a comment to T3034: generalize usage of SWHID for referencing SWH archive objects.

(I've finally caught up with the backlog in this task, sorry I'm late to the party.)

Mon, Feb 15, 11:22 AM · Data Model
zack updated the task description for T3034: generalize usage of SWHID for referencing SWH archive objects.
Mon, Feb 15, 11:12 AM · Data Model
zack updated the task description for T3034: generalize usage of SWHID for referencing SWH archive objects.
Mon, Feb 15, 11:10 AM · Data Model

Sat, Feb 13

zack added a comment to T3035: Triage documentation in wiki.

(thanks for this!)

Sat, Feb 13, 9:51 AM · Documentation
zack updated the task description for T3035: Triage documentation in wiki.
Sat, Feb 13, 9:50 AM · Documentation

Tue, Feb 9

zack renamed T3034: generalize usage of SWHID for referencing SWH archive objects from Generalise usage of SWHID for storing edges (relations) of the SWH archive graph to generalize usage of SWHID for referencing SWH archive objects.
Tue, Feb 9, 4:34 PM · Data Model

Mon, Feb 8

zack updated the title for P942 jitsiroom - get a fresh URL for a Jitsi room from Command-Line Input to jitsiroom - get a fresh URL for a Jitsi room.
Mon, Feb 8, 10:31 AM
zack created P942 jitsiroom - get a fresh URL for a Jitsi room.
Mon, Feb 8, 10:31 AM

Fri, Feb 5

zack added a member for Developers: DanSeraf.
Fri, Feb 5, 5:29 PM
zack removed a member for Interns: haltode.
Fri, Feb 5, 4:31 PM
zack removed a member for Interns: kalpitk.
Fri, Feb 5, 4:31 PM
zack removed a member for Interns: nahimilega.
Fri, Feb 5, 4:31 PM
zack removed a member for Interns: DanSeraf.
Fri, Feb 5, 4:31 PM
zack added a project to T3030: Improve loaders to deal with new visit status events: Core Loader.
Fri, Feb 5, 3:40 PM · Core Loader
zack accepted D5032: scanner-benchmark: improve logging information.
Fri, Feb 5, 3:23 PM

Thu, Feb 4

zack accepted D5011: scanner-benchmark: use os.listdir() instead of os.walk() to avoid symlinks.
Thu, Feb 4, 5:45 PM
zack requested changes to D4985: [WIP] Add support for SWHID on Origin.

I'm requesting changes as the idea of hexlifying URLs behind the "swh:1:ori:" prefix is still being discussed (on the swh-devel list) and, for now, I'm against that specific part of the proposal.

Thu, Feb 4, 4:02 PM
zack added inline comments to D5011: scanner-benchmark: use os.listdir() instead of os.walk() to avoid symlinks.
Thu, Feb 4, 2:58 PM
zack requested changes to D5011: scanner-benchmark: use os.listdir() instead of os.walk() to avoid symlinks.
Thu, Feb 4, 2:56 PM
zack created P939 Command-Line Input.
Thu, Feb 4, 12:00 PM

Wed, Feb 3

zack accepted D5003: celery: acknowledge tasks as soon as they're received.

Awesome!, thanks.

Wed, Feb 3, 10:17 PM
zack added a comment to T2912: Next generation archive counters.

Thanks @vsellier, that seems quite ok indeed. The only question left is to know if the estimator implemented is monotonic (i.e. we will never have negative bumps in the graph :-))

Wed, Feb 3, 4:08 PM · System administration, Monitoring, Web app
zack added a comment to T3008: Document hedgedoc in the wiki.
In T3008#58039, @olasd wrote:

I've written a basic documentation page : https://wiki.softwareheritage.org/wiki/HedgeDoc

@zack, do you see anything else you would like to have documented?

Wed, Feb 3, 3:26 PM · Documentation, System administration
zack triaged T3014: Using an RBD image to store artifacts as Normal priority.
Wed, Feb 3, 3:23 PM · Object storage

Tue, Feb 2

zack accepted D4996: algo_min: delete the upstream directories if a (sub)directory is unknown.
Tue, Feb 2, 3:41 PM
zack accepted D4993: scanner-benchmark: some code improvement.
Tue, Feb 2, 2:18 PM

Feb 1 2021

zack added projects to T3008: Document hedgedoc in the wiki: System administration, Documentation.
Feb 1 2021, 11:22 AM · Documentation, System administration

Jan 30 2021

zack committed rDSNIPf75b2e7c1b4d: sql/blob-size-stats: scripts to compute basic stats about archive blob sizes (authored by zack).
sql/blob-size-stats: scripts to compute basic stats about archive blob sizes
Jan 30 2021, 7:42 PM

Jan 28 2021

zack accepted D4947: scanner-benchmark: algo_min fixed, retry mechanism on request error.
Jan 28 2021, 10:58 AM

Jan 26 2021

zack added a comment to D4947: scanner-benchmark: algo_min fixed, retry mechanism on request error.

for context: this change is only going to be committed in a separate branch that we're using for experiments with swh-scanner, hence the testing requirements are different/lower than for master

Jan 26 2021, 3:00 PM

Jan 21 2021

zack triaged T2983: graph service: allow loading in memory only one direction of the graph as Low priority.
Jan 21 2021, 10:09 AM · Graph service

Jan 20 2021

zack triaged T2981: Graph API: add a (node type) result filters as Normal priority.
Jan 20 2021, 3:25 PM · Graph service

Jan 19 2021

zack created P926 (An Untitled Masterwork).
Jan 19 2021, 6:37 PM
zack created P923 Command-Line Input.
Jan 19 2021, 9:40 AM

Jan 18 2021

zack accepted D4875: scanner-benchmark: the temporary directory is removed by tempdir.

ah, yes, of course, sorry I didn't notice this either in the previous diff :-)

Jan 18 2021, 10:39 AM

Jan 15 2021

zack accepted D4870: test: Fix parse_swhid failing test.
Jan 15 2021, 3:49 PM
zack requested changes to D4870: test: Fix parse_swhid failing test.

Why do we still have tests that check error messages meant for human consumption, again? :-)
If the difference in how they fail is meaningful (which I don't know), they should raise different exceptions and we should just check their types.

Jan 15 2021, 2:03 PM

Jan 14 2021

zack committed rMSLD2025fef0931a: check-in slides for Boa workshop (authored by zack).
check-in slides for Boa workshop
Jan 14 2021, 2:52 PM

Jan 13 2021

zack accepted D4851: scanner benchmark: output format and repository extraction in temporary directories.

looks great, thanks !

Jan 13 2021, 2:23 PM

Jan 11 2021

zack committed rDDOCefd06ed05da1: CONTRIBUTORS: add Loïc Dachary (authored by zack).
CONTRIBUTORS: add Loïc Dachary
Jan 11 2021, 10:35 AM
zack closed T2946: pre-commit is required by bin/update as Resolved by committing rDDOC23eb98acef51: bin/update fails unless pre-commit is installed.
Jan 11 2021, 10:34 AM · Easy hack, Documentation
zack closed D4832: bin/update fails unless pre-commit is installed.
Jan 11 2021, 10:34 AM
zack committed rDDOC23eb98acef51: bin/update fails unless pre-commit is installed (authored by dachary).
bin/update fails unless pre-commit is installed
Jan 11 2021, 10:33 AM

Jan 9 2021

zack changed the visibility for T2947: Exception: An error occurred when running svnrdump and no exploitable dump file has been generated..
Jan 9 2021, 7:46 PM · SVN Loader
zack triaged T2947: Exception: An error occurred when running svnrdump and no exploitable dump file has been generated. as High priority.
Jan 9 2021, 7:46 PM · SVN Loader
zack added a comment to T2946: pre-commit is required by bin/update.

Thanks for the bug report. Do you have a log of what happens upon bin/update if pre-commit is missing? (it's sure is needed for committing, but i'm surprised it's needed for bin/update).
Also, is it needed for a raw "bin/update" or only for "bin/update -i" (that also install Python dependencies?).

Jan 9 2021, 5:30 PM · Easy hack, Documentation
zack added a comment to D4821: Add LLP compression to the WebGraph pipeline.

oh, also, this diff should close T2647 in its commit message

Jan 9 2021, 4:06 PM
zack added a revision to T2647: add LLP support to graph compression pipeline: D4821: Add LLP compression to the WebGraph pipeline.
Jan 9 2021, 4:06 PM · Graph service
zack added a task to D4821: Add LLP compression to the WebGraph pipeline: T2647: add LLP support to graph compression pipeline.
Jan 9 2021, 4:06 PM

Jan 8 2021

zack requested changes to D4821: Add LLP compression to the WebGraph pipeline.
Jan 8 2021, 4:09 PM

Jan 7 2021

zack added a revision to T2595: Add a default configuration based on graph size (eg: batch_size): D4820: config: sane default for batch_size using a heuristic on ram size.
Jan 7 2021, 10:34 PM · Graph service
zack added a task to D4820: config: sane default for batch_size using a heuristic on ram size: T2595: Add a default configuration based on graph size (eg: batch_size).
Jan 7 2021, 10:34 PM
zack accepted D4820: config: sane default for batch_size using a heuristic on ram size.

note that this diff should close T2595, so the commit message should be updated

Jan 7 2021, 10:33 PM
zack added a comment to T2942: Update deposit: `swhid` MUST exist in archive for a metadata-only deposit.

My answer is a very emphatic YES.

Jan 7 2021, 3:10 PM · SWORD deposit

Jan 6 2021

zack renamed T2933: shared pg_service configuration on all hosts from shared pg_service configuration of all hosts to shared pg_service configuration on all hosts.
Jan 6 2021, 2:11 PM · System administration
zack updated the task description for T2933: shared pg_service configuration on all hosts.
Jan 6 2021, 1:39 PM · System administration
zack triaged T2933: shared pg_service configuration on all hosts as Normal priority.
Jan 6 2021, 1:39 PM · System administration
zack changed the visibility for F4241580: swh-download-screenshot.png.
Jan 6 2021, 11:15 AM
zack triaged T2930: vault: use SWHIDs as identifiers shown to user in the download window as Low priority.
Jan 6 2021, 11:14 AM · Web app

Jan 5 2021

zack added a comment to T2912: Next generation archive counters.
In T2912#55849, @olasd wrote:

I think we should be able to decouple these counters completely from the loaders, and have them directly updated/handled by a client of the swh-journal. This would be a "centralized" component, but which we can parallelize quite heavily thanks to basic kafka design. We can also leverage the way kafka clients do parallelism to sidestep the locking issues arising in a potentially distributed filter.

Maybe my writing was not all that clear: I also had in mind a single centralised component (the ArchiveCounter) per Bloom filter, receiving the lists newcontents of ids from the loaders.
Getting the feed of ids from swh-journal instead of from the loaders is really neat: we avoid touching the loader code, and we gain a better capability of monitoring the load on the ArchiveCounter, so I'm all for it :-)

Jan 5 2021, 6:01 PM · System administration, Monitoring, Web app

Jan 4 2021

zack accepted D4801: FUSE: cache: share sqlite connection between metadata/history cache.
Jan 4 2021, 6:55 PM
zack requested changes to D4801: FUSE: cache: share sqlite connection between metadata/history cache.

I don't like this "asymmetric" approach much. I'd rather have both caches being equal and taking as class constructor input an optional sqlite connection. If it's None, they'll each create a connection (and not share it). If a connection is passed, they use it and do not close it on exit. Then, the init code in FuseCache takes care of initializing the shared sqlite connection, and passing it to both constructors. This will make the classes more reusable and the init code more clean.

Jan 4 2021, 10:49 AM
zack accepted D4797: FUSE: proof-read documentation.
Jan 4 2021, 9:42 AM