Page MenuHomeSoftware Heritage

zack (Stefano Zacchiroli)
UserAdministrator

User Details

User Since
Sep 7 2015, 3:43 PM (295 w, 4 d)
Roles
Administrator

Recent Activity

Today

zack added a parent task for T735: SourceForge lister: T3315: archive SourceForge.
Fri, May 7, 5:25 PM · Origin-SourceForge
zack added a subtask for T3315: archive SourceForge: T735: SourceForge lister.
Fri, May 7, 5:25 PM · Archive coverage
zack triaged T3315: archive SourceForge as Normal priority.
Fri, May 7, 5:25 PM · Archive coverage
zack triaged T3313: Web API: per-user accounting as Low priority.
Fri, May 7, 9:48 AM · System administration, Web app
zack triaged T3312: web API rate limit: 10x more quota for authenticated users as High priority.
Fri, May 7, 9:35 AM · Web app

Yesterday

zack added a comment to T3311: Use .gitmodules to discover origins.

I think the only issue with (3) is not being retroactive

Thu, May 6, 6:49 PM · Archive coverage, Git loader
zack added a comment to T3311: Use .gitmodules to discover origins.

This is a good idea, thanks for raising it.

Thu, May 6, 6:06 PM · Archive coverage, Git loader
zack added a comment to D5704: keycloak: Set SSO Session Idle to one week, Session Max to one month.

Why 6 hours and not, say, 1 week or even 1 month?
It is very common these days to remain connected for that long, and the UX in having to relogin often is a lot worse.

Thu, May 6, 3:00 PM

Mon, May 3

zack renamed T3301: graph: add test for the "algo" parameter of walk() from swh-graph: No tests of the "algo" parameter of walk() to graph: add test for the "algo" parameter of walk().
Mon, May 3, 6:55 PM · Easy hack, Graph service
zack added a comment to D5420: cli/identify: Add support for --recursive.

@KShivendu zack's comment explains how the code should work, and gives a pointer to an existing implementation of the technique (swh-scanner); this should be enough to start.

But don't feel obligated to continue working on this; as mentioned the task is harder than we expected.

Mon, May 3, 1:54 PM

Fri, Apr 30

zack accepted D5657: Spool large packfiles to disk instead of consuming tons of memory.

nice hack/trade-off !

Fri, Apr 30, 8:39 PM
zack accepted D5654: docs/persistent-identifiers: Add guidelines for fixing invalid SWHIDs (this time for uppercase).

great wording, thanks !

Fri, Apr 30, 1:29 PM
zack committed rDTSCN3f8784c726a7: run_benchmark.sh: add missing "scan_time" column header (authored by zack).
run_benchmark.sh: add missing "scan_time" column header
Fri, Apr 30, 11:14 AM

Thu, Apr 29

zack accepted D5644: scanner-benchmark: add algorithms timings in results.

the fact that algo_min is treated differently than other cases is horrible :-P, but it's not new in this diff, so ok :)
also, in the requirements you should probably put what's the minimum version of swh.core that you need, but that too is not a big deal

Thu, Apr 29, 3:01 PM
zack added a comment to T3298: Consider making SWHID handling case insensitive.

Ah, this is an interesting practical problem.
I'm not a fan of changing the spec of SWHID version 1 to make them case insensitive, as it seems to be a significant change (in particular for the code that checks for the syntactic correctness of IDs).
But we can totally add a "SHOULD" section to the resolvers part of the spec recommending (but not mandating) that resolvers treat core SWHIDs as case insensitive. (Of course all the contextual parts cannot be considered case insensitive.)

Thu, Apr 29, 12:17 PM · Data Model, Web app

Wed, Apr 28

zack accepted D5629: graph: s/REST/RPC/.
Wed, Apr 28, 8:45 AM
zack accepted D5630: vault: s/REST/RPC/.
Wed, Apr 28, 8:43 AM
zack accepted D5631: lister: s/REST( API)?/API/.
Wed, Apr 28, 8:42 AM
zack accepted D5632: web: s/Graph REST API/Graph RPC API/.
Wed, Apr 28, 8:42 AM

Tue, Apr 27

zack added a comment to T1576: document the typical cost(s) of hosting an archive mirror.

docs !

Tue, Apr 27, 4:06 PM · Documentation, Mirror

Mon, Apr 26

zack added a comment to T3087: Implement support for takedown notices (infra, admin tools, workflow).

So what about exports of the archive available on git-annex?

Mon, Apr 26, 8:34 AM · meta-task, Roadmap 2021, Web app

Mon, Apr 19

zack added a comment to T2265: Building the documentation should not show any warning..
Mon, Apr 19, 2:06 PM · Easy hack, Documentation

Sat, Apr 17

zack triaged T3260: publish swh.dataset to pypi as Low priority.
Sat, Apr 17, 12:31 PM · Continuous Integration, Datasets
zack committed rDDATASET34da22d6d14e: doc: fix sphinx line continuation glitches in schema.rst (authored by zack).
doc: fix sphinx line continuation glitches in schema.rst
Sat, Apr 17, 11:23 AM
zack added a comment to T2265: Building the documentation should not show any warning..

I'm confused about the status of this task. I've just rebuilt the docs for docs.s.o and it says "build succeeded, 83 warnings.". So is the fix for this not yet "deployed" somehow? Also, why is the build succeeding even with all those warnings? Because as long as that's the case, we will for sure keep reintroducing warnings as time goes by.

Sat, Apr 17, 11:17 AM · Easy hack, Documentation

Fri, Apr 16

zack added a comment to T3252: Better handling of erroneous origins submitted to save code now.

@rdicosmo great summary, I'm certainly on that page :)

Fri, Apr 16, 3:30 PM · System administrators, Web app
zack added a comment to T3252: Better handling of erroneous origins submitted to save code now.

thanks !

Fri, Apr 16, 1:15 PM · System administrators, Web app
zack added a comment to T3252: Better handling of erroneous origins submitted to save code now.

but adding an email field (auto filled for registered users) to send a notification after the origin was loaded seems a good tradeoff. To implement the email notification, we will have to add a journal client in swh-web processing origin visit messages.

Fri, Apr 16, 11:43 AM · System administrators, Web app

Thu, Apr 15

zack added a comment to T3252: Better handling of erroneous origins submitted to save code now.

Oh, and now that we have user profile pages, we should have a list of "my" save code now requests with their status visible in the user profile, for those who want to check synchronously the status of their requests (and might have disabled email notifications).

Thu, Apr 15, 11:35 PM · System administrators, Web app
zack added a comment to T3252: Better handling of erroneous origins submitted to save code now.

It would be desirable to provide the user with feedback that helps fix the issue.

Thu, Apr 15, 11:33 PM · System administrators, Web app
zack accepted D5540: docs: Update for new schema.

Thanks. Can you please make a release after landing this, so that docs.s.o gets updated?

Thu, Apr 15, 9:42 PM

Wed, Apr 14

zack closed T1968: existing graph endpoints should not return 404 upon missing arguments as Invalid.

Sure! My apologies @Hakimb, but it's thank to your work that we have realized what was the right fate for this task.

Wed, Apr 14, 5:10 PM · Easy hack, Graph service
zack updated subscribers of T1968: existing graph endpoints should not return 404 upon missing arguments.

@seirl, @vlorentz: I see your point, and I agree. We should never have used /nested/paths for this API.
Maybe we should just reconsider this and, one @Hakimb is ready with a new traversal language proposal, we can map it to a better REST API that uses query parameters, and deal properly with 4xx return codes.

Wed, Apr 14, 4:15 PM · Easy hack, Graph service
zack added a comment to T2981: Graph API: add a (node type) result filters.
In T2981#63164, @Hakimb wrote:

questions:

1/ So for the "filter that applies to visits that return nodes one by one" part, we are talking about: neighbors, walk, visit/nodes only?

Wed, Apr 14, 4:13 PM · Graph service
zack requested changes to D5522: Add athena subcommand to create/query AWS Athena database.

Most of my comments are minor/nice to have, although I'd like to be able to pass queries directly on the CLI.

Wed, Apr 14, 4:09 PM

Mon, Apr 12

zack updated subscribers of T3242: Decommission ClearlyDefined resources.

@vsellier: ack on the outboarding, that is actionable as of now.

Mon, Apr 12, 5:16 PM · System administration
zack added a comment to T3084: Fast track save code now requests.

Thanks for this!

Mon, Apr 12, 5:11 PM · System administration, Web app

Thu, Apr 8

zack added a comment to T3161: graph service: add anti-DoS limit on the number of edges traversed.

ok, so @Hakimb: go for no default value. If the query param is not passed, the visit will not stop before the end. If it's given, it will stop once the limit is reached. Call the query param ?max_edges. You will find that the java code already keeps track of the number of edges traversed, so you should just need to compare with that.

Thu, Apr 8, 2:44 PM · Graph service
zack added a comment to T3161: graph service: add anti-DoS limit on the number of edges traversed.

To complement what @vlorentz mentioned, we should actually stop the visit after the maximum number of edges has been reached, because it is keep doing the visit (no matter how many results are returned after it) that can DoS the swh-graph backend.

Thu, Apr 8, 2:24 PM · Graph service

Apr 7 2021

zack accepted D5438: bin/install: Add support for running outside a virtualenv.

(good catch also for the missing "$@" in the last invocation)

Apr 7 2021, 1:26 PM
zack added a comment to T3084: Fast track save code now requests.

@ardumont we briefly discussed this a while ago with @olasd. I think the proposed solution was indeed to have a separate queue (and workers) for "save code now" request, but not necessarily one separate queue per loader, because the current priority system wasn't considered to be "fast enough". Maybe we can discuss this briefly with him and synthesize here what you come up with?

Apr 7 2021, 1:02 PM · System administration, Web app
zack added a comment to D5427: NodeIdMap: use the MPH + mmapped .order to translate SWHID -> node ID.

I don't think that's good enough. We should have an overview of swh-graph's design that doesn't require reading all the code in an unspecified order.
And reading the code does not give a rationale for the decision.

Apr 7 2021, 12:52 PM

Apr 6 2021

zack committed rMSLD93db6cff6c7b: swh-scanner talk: add links to code and pypi package (authored by zack).
swh-scanner talk: add links to code and pypi package
Apr 6 2021, 4:37 PM
zack committed rMSLD3d4ddee13cde: minor changes and updates for LLW 2021 talk (authored by zack).
minor changes and updates for LLW 2021 talk
Apr 6 2021, 4:28 PM
zack closed T3212: typo in the identify function in swh-model/swh/model/cli.py as Invalid.

No, swh identify is correct, as all SWH CLI commands register as sub-commands of the main swh executable.

Apr 6 2021, 4:03 PM · Documentation
zack resigned from D5411: return a 400 error when accessing endpoints without the arguments.
Apr 6 2021, 12:33 PM
zack added a project to T3209: Fix swh-scanner for python > 3.7: Code scanner.
Apr 6 2021, 12:01 PM · Code scanner
zack requested changes to D5411: return a 400 error when accessing endpoints without the arguments.

also, can you add tests verifying that calling the API without an argument does in fact return 400 error?

Apr 6 2021, 11:59 AM
zack requested changes to D5420: cli/identify: Add support for --recursive.
Apr 6 2021, 11:39 AM
zack closed T1136: swh-identify: support recursive checksumming of directories as Invalid.

duplicate with T3160

Apr 6 2021, 11:36 AM · Data Model
zack committed rMSLD37e00f419eda: check-in slide skeleton for LLW 2021 (authored by zack).
check-in slide skeleton for LLW 2021
Apr 6 2021, 11:03 AM
zack added inline comments to D5420: cli/identify: Add support for --recursive.
Apr 6 2021, 10:44 AM

Apr 2 2021

zack added a reviewer for D5411: return a 400 error when accessing endpoints without the arguments: seirl.
Apr 2 2021, 6:09 PM
zack added a comment to T3196: Improve discoverability of the permalinks tab.

@anlambert it looks like we're thinking at the same placement for the link that open the permalink box. The main difference seems to be "modal popup" v. "drop-down section" (that makes the rest of the page scroll down). Maybe you can just try both and see what looks best?

Apr 2 2021, 8:13 AM · Web app

Apr 1 2021

zack added a comment to T3196: Improve discoverability of the permalinks tab.

Adding both something (the animation) and an optional checkbox to hide (because it is potentially annoying in the long run) does not sound like a great UX.

Apr 1 2021, 10:28 PM · Web app
zack closed T2269: cron spam: <root@*> find /var/log/kafka -type f -not -name *.gz -a -ctime +1 -exec gzip {} \+ as Resolved.
Apr 1 2021, 9:25 PM · System administration
zack edited Description on Roadmap 2021.
Apr 1 2021, 11:03 AM
zack edited Description on Roadmap 2021.
Apr 1 2021, 11:00 AM

Mar 31 2021

zack committed rDGRPH8d30918cd7f8: docs: drop mention of conffile in quickstart (authored by zack).
docs: drop mention of conffile in quickstart
Mar 31 2021, 5:19 PM
zack renamed T1538: Save "forge" now from save "forge" now to Save "forge" now.
Mar 31 2021, 11:07 AM · meta-task, Roadmap 2021, Web app
zack moved T3175: Prepare production environment from Backlog to Done on the Roadmap 2021 board.
Mar 31 2021, 11:05 AM · Roadmap 2021, System administration, Monitoring

Mar 30 2021

zack added a comment to T2833: cpan.loader - preserver Perl modules from CPAN.

awesome, thanks @joenio ! you can also drop by our other devel communication channel if you want to discuss this in other ways: https://www.softwareheritage.org/community/developers/

Mar 30 2021, 3:29 PM · Archive coverage
zack renamed T2833: cpan.loader - preserver Perl modules from CPAN from [feature request] cpan.loader - preserver Perl modules from CPAN to cpan.loader - preserver Perl modules from CPAN.
Mar 30 2021, 8:22 AM · Archive coverage
zack raised the priority of T2833: cpan.loader - preserver Perl modules from CPAN from Wishlist to Normal.
Mar 30 2021, 8:22 AM · Archive coverage
zack added a comment to T2833: cpan.loader - preserver Perl modules from CPAN.

Hey, yes, we want to have one, but nobody is working it at the moment, and we rather have someone knowledgeable with that ecosystem to work on it. So, if you're interested, you're more than welcome to help there! (And thank you in advance.)

Mar 30 2021, 8:21 AM · Archive coverage

Mar 29 2021

zack committed rMSLD17714c5a3348: CYU talk: use more recent data model slide (authored by zack).
CYU talk: use more recent data model slide
Mar 29 2021, 7:34 PM
zack committed rMSLD42be91daa092: check in slides for tomorrow talk at CYU (authored by zack).
check in slides for tomorrow talk at CYU
Mar 29 2021, 3:01 PM

Mar 27 2021

zack closed T3180: [spam] as Invalid.
Mar 27 2021, 5:54 PM · General, Web client
zack committed R183:bb3690aee756: add recent papers (authored by zack).
add recent papers
Mar 27 2021, 2:31 PM
zack committed R183:04b760d62231: add citation for Apache Gremlin graph traversal language (authored by zack).
add citation for Apache Gremlin graph traversal language
Mar 27 2021, 2:17 PM

Mar 26 2021

zack triaged T3178: document how to export the graph dataset automatically as Normal priority.
Mar 26 2021, 12:25 PM · Documentation, Datasets
zack reopened T1847: fully automate export of the graph dataset, a subtask of T1848: refresh graph dataset export, as Open.
Mar 26 2021, 12:25 PM · Datasets
zack reopened T1847: fully automate export of the graph dataset as "Open".

reopening, as ideally we'd like to have run the entire ORC export once to completion before closing

Mar 26 2021, 12:25 PM · Graph service, Datasets

Mar 23 2021

zack updated the task description for T3168: Proper deployment of swh-graph with debian package.
Mar 23 2021, 12:24 PM · Graph service, Puppet recipes
zack added a project to T3168: Proper deployment of swh-graph with debian package: Graph service.
Mar 23 2021, 12:23 PM · Graph service, Puppet recipes

Mar 22 2021

zack renamed T3161: graph service: add anti-DoS limit on the number of edges traversed from graph service: add limit on the number of edges traversed to graph service: add anti-DoS limit on the number of edges traversed.
Mar 22 2021, 9:43 AM · Graph service
zack added a subtask for T2220: swh-graph in production: T3161: graph service: add anti-DoS limit on the number of edges traversed.
Mar 22 2021, 9:43 AM · meta-task, Roadmap 2021, Graph service
zack added a parent task for T3161: graph service: add anti-DoS limit on the number of edges traversed: T2220: swh-graph in production.
Mar 22 2021, 9:43 AM · Graph service
zack triaged T3161: graph service: add anti-DoS limit on the number of edges traversed as Normal priority.
Mar 22 2021, 9:12 AM · Graph service
zack closed T2113: swh-graph: add support to optionally resolve ori PIDs to origin URLs as Wontfix.

Now that this is (optionally) done by swh-web, I don't think we want to implement it in swh-graph too.

Mar 22 2021, 8:56 AM · Graph service

Mar 21 2021

zack added a comment to D5295: Add type annotations to metadata mappings.

While you are at it, and as a minor point, please also double check your commit message, it doesn't match our conventions (e.g., it is in passive voice, while it shouldn't).

Mar 21 2021, 10:58 PM

Mar 20 2021

zack committed rMSLD63f6936d4189: LibrePlanet talk: last touches (authored by zack).
LibrePlanet talk: last touches
Mar 20 2021, 2:37 PM
zack renamed T3160: swh identify: add a -R/--recursive flag from swh identify: add a -R/--recursive to swh identify: add a -R/--recursive flag.
Mar 20 2021, 2:22 PM · Easy hack, Data Model
zack updated the task description for T3160: swh identify: add a -R/--recursive flag.
Mar 20 2021, 2:21 PM · Easy hack, Data Model
zack triaged T3160: swh identify: add a -R/--recursive flag as Normal priority.
Mar 20 2021, 2:20 PM · Easy hack, Data Model

Mar 19 2021

zack accepted D5292: cli: Don't show a traceback or warning if the config file does not exist.
Mar 19 2021, 5:47 PM
zack committed rMSLD5ec7dc16b302: check in slides for LibrePlanet 2021 (authored by zack).
check in slides for LibrePlanet 2021
Mar 19 2021, 5:13 PM
zack placed T2234: Write use case-specific documentation up for grabs.

Please do not claim tasks @shivam2003, just submit a patch fixing the issue when you have one. Thanks.

Mar 19 2021, 5:10 PM · Roadmap 2021, meta-task, Documentation
zack committed rMSLD83819f6e6034: common: add SwhFS ICSE paper to biblio module (authored by zack).
common: add SwhFS ICSE paper to biblio module
Mar 19 2021, 4:29 PM
zack committed rMSLDdcf96f56494d: common: revamp some old/common slides to reflect current state (authored by zack).
common: revamp some old/common slides to reflect current state
Mar 19 2021, 4:29 PM
zack committed rMSLDcd8af720ce3c: common: add swh identify tutorial/example to SWHID module (authored by zack).
common: add swh identify tutorial/example to SWHID module
Mar 19 2021, 4:29 PM
zack committed rMSLDc2d00871acb0: common: add (minimal) slide module for swh-fuse (authored by zack).
common: add (minimal) slide module for swh-fuse
Mar 19 2021, 4:29 PM
zack committed rMSLDe9f19e6288df: common: add one-slider module about the Merkle structure (authored by zack).
common: add one-slider module about the Merkle structure
Mar 19 2021, 4:29 PM
zack created P979 Command-Line Input.
Mar 19 2021, 2:26 PM
zack committed rMSLDb6c2a59d3dc5: common/images: add archive coverage image + links for coverage & growth (authored by zack).
common/images: add archive coverage image + links for coverage & growth
Mar 19 2021, 2:15 PM
zack committed rDSEA7b3b0dca9d55: doc: capitalize heading title (authored by zack).
doc: capitalize heading title
Mar 19 2021, 11:02 AM
zack committed rDDOC54fe755ea8a9: make heading for swh-loader page consisted with other packages (authored by zack).
make heading for swh-loader page consisted with other packages
Mar 19 2021, 10:46 AM

Mar 18 2021

zack added a member for Developers: aeviso.
Mar 18 2021, 2:45 PM
zack removed a member for Developers: tenma.
Mar 18 2021, 2:45 PM
zack removed a member for Developers: fiendish.
Mar 18 2021, 2:44 PM