Page MenuHomeSoftware Heritage

zack (Stefano Zacchiroli)
UserAdministrator

User Details

User Since
Sep 7 2015, 3:43 PM (307 w, 4 d)
Roles
Administrator

Recent Activity

Thu, Jul 29

zack accepted D6027: swh-scanner: add 'auto' option as default policy.
Thu, Jul 29, 10:01 AM

Wed, Jul 28

zack requested changes to D6027: swh-scanner: add 'auto' option as default policy.

I'm requesting some minor changes (+ some other changes to be submitted in a separate diff which I've noticed only now, sorry!).

Wed, Jul 28, 11:08 AM

Fri, Jul 23

zack committed rDDOCf9451bd1038a: changelog: fix sphinx markup error in sourceforge entry (authored by zack).
changelog: fix sphinx markup error in sourceforge entry
Fri, Jul 23, 9:59 AM

Thu, Jul 22

zack added a comment to D5952: changelog: Reference first completion of sourceforge git/svn origins.

awesome \o/
(diff accepted)

Thu, Jul 22, 10:33 AM
zack accepted D5952: changelog: Reference first completion of sourceforge git/svn origins.
Thu, Jul 22, 10:33 AM
zack accepted D6005: docs/query-language: Describe search query language syntax.

I'm accepting this diff, but note that I've added a few suggestions for improved language above. Please integrate them before this is final.

Thu, Jul 22, 9:29 AM

Wed, Jul 21

zack added inline comments to D6005: docs/query-language: Describe search query language syntax.
Wed, Jul 21, 11:47 AM
zack requested changes to D6005: docs/query-language: Describe search query language syntax.
Wed, Jul 21, 11:42 AM
zack accepted D5996: swh-scanner: new scan policies.

Wonderful, thanks for adding the order tests! LGTM.

Wed, Jul 21, 9:59 AM

Mon, Jul 19

zack added inline comments to D5996: swh-scanner: new scan policies.
Mon, Jul 19, 4:25 PM
zack added inline comments to D6005: docs/query-language: Describe search query language syntax.
Mon, Jul 19, 1:47 PM
zack requested changes to D6005: docs/query-language: Describe search query language syntax.

thanks @KShivendu, this is a great start!

Mon, Jul 19, 12:26 PM

Sat, Jul 17

zack added a comment to D5990: query_language: Setup tree-sitter and grammar.js.

@zack
It is okay with you if I add it in the next diff? This one has become extremely long because of lots of build failures.

Sat, Jul 17, 12:13 PM

Fri, Jul 16

zack added a comment to D5990: query_language: Setup tree-sitter and grammar.js.
In D5990#154613, @zack wrote:

Can we have some documentation of the query language, included in this diff?
E.g., a file under docs/ which will then be rendered on docs.s.o as user documentation for how to use the query language.

Fri, Jul 16, 11:35 AM
zack added a comment to D5990: query_language: Setup tree-sitter and grammar.js.

Can we have some documentation of the query language, included in this diff?
E.g., a file under docs/ which will then be rendered on docs.s.o as user documentation for how to use the query language.

Fri, Jul 16, 11:33 AM
zack added a comment to T3127: Compute and display distribution of origins by forge.

Thanks for this update, great work!

Fri, Jul 16, 11:29 AM · Metrics/monitoring, Web app, Roadmap 2021, meta-task
zack requested changes to D5996: swh-scanner: new scan policies.

Looks great! I've noted down only some nits.

Fri, Jul 16, 11:02 AM

Thu, Jul 15

zack renamed T3431: Implement a MongoDB backend for SWH-provenance from Implement a MonoDB backend for SWH-provenance to Implement a MongoDB backend for SWH-provenance .
Thu, Jul 15, 10:52 AM · Provenance database

Thu, Jul 8

zack accepted D5981: scanner: access MerkleNodeInfo with the correct key.
Thu, Jul 8, 5:29 PM
zack changed the status of T2730: scanner: should output the root SWHID as well from Open to Work in Progress.
Thu, Jul 8, 2:13 PM · Easy hack, Code scanner
zack changed the status of T2692: Move the output related functions to another (sub)module from Open to Work in Progress.
Thu, Jul 8, 2:13 PM · Code scanner
zack moved T3318: scanner should use the known() method of web.client from In progress to Backlog on the Code scanner board.
Thu, Jul 8, 2:13 PM · Code scanner
zack added a parent task for T2635: web client: add async API: T3318: scanner should use the known() method of web.client.
Thu, Jul 8, 2:11 PM · Web client
zack added a subtask for T3318: scanner should use the known() method of web.client: T2635: web client: add async API.
Thu, Jul 8, 2:11 PM · Code scanner
zack accepted D5926: swh.scanner: use model.from_disk instead of scanner.model to store a source code project.

Please note down on the sides two remaining TODOs about all this:

  • adding a test case with a deduplicated source tree, to make sure nodes that are deduplicated at the Merkle DAG level are present multiple times in the output
  • adding a test case for a path that is not decodable in utf-8, to make sure it can be handled propertly
Thu, Jul 8, 9:43 AM

Wed, Jul 7

zack added inline comments to D5926: swh.scanner: use model.from_disk instead of scanner.model to store a source code project.
Wed, Jul 7, 11:55 AM

Tue, Jul 6

zack requested changes to D5926: swh.scanner: use model.from_disk instead of scanner.model to store a source code project.

Thanks, both the general structure and implementation look OK.
I'm requesting changes to address two main issues:

Tue, Jul 6, 10:12 AM

Mon, Jul 5

zack added a parent task for T3349: use swh.model.merkle/from_disk instead of swh.scanner.model: T2730: scanner: should output the root SWHID as well.
Mon, Jul 5, 3:21 PM · Code scanner
zack added a subtask for T2730: scanner: should output the root SWHID as well: T3349: use swh.model.merkle/from_disk instead of swh.scanner.model.
Mon, Jul 5, 3:21 PM · Easy hack, Code scanner
zack removed a parent task for T2730: scanner: should output the root SWHID as well: T3349: use swh.model.merkle/from_disk instead of swh.scanner.model.
Mon, Jul 5, 3:20 PM · Easy hack, Code scanner
zack removed a subtask for T3349: use swh.model.merkle/from_disk instead of swh.scanner.model: T2730: scanner: should output the root SWHID as well.
Mon, Jul 5, 3:20 PM · Code scanner
zack changed the status of T3420: scanner: make the various query algorithms user-selectable from Open to Work in Progress.
Mon, Jul 5, 3:11 PM · Code scanner
zack assigned T3318: scanner should use the known() method of web.client to DanSeraf.
Mon, Jul 5, 3:11 PM · Code scanner
zack changed the status of T3318: scanner should use the known() method of web.client from Open to Work in Progress.
Mon, Jul 5, 3:11 PM · Code scanner
zack added a parent task for T3349: use swh.model.merkle/from_disk instead of swh.scanner.model: T3420: scanner: make the various query algorithms user-selectable.
Mon, Jul 5, 3:10 PM · Code scanner
zack added a subtask for T3420: scanner: make the various query algorithms user-selectable: T3349: use swh.model.merkle/from_disk instead of swh.scanner.model.
Mon, Jul 5, 3:10 PM · Code scanner
zack triaged T3420: scanner: make the various query algorithms user-selectable as Normal priority.
Mon, Jul 5, 3:10 PM · Code scanner

Fri, Jul 2

zack accepted D5951: model: make deduplication optional when iterating over the merkle tree.
Fri, Jul 2, 10:16 AM

Thu, Jul 1

zack added inline comments to D5951: model: make deduplication optional when iterating over the merkle tree.
Thu, Jul 1, 8:31 PM
zack requested changes to D5951: model: make deduplication optional when iterating over the merkle tree.

only minor changes requested to the docstring on my part

Thu, Jul 1, 8:27 PM

Jul 1 2021

zack added a comment to D5952: changelog: Reference first completion of sourceforge git/svn origins.

I don't really have an ETA yet [1]. We are roughly 67% done for git and 84.6% for svn
[2]. For mercurial, it's not started as other blocking points are being worked on.
Bazaar and cvs origins are listed but we don't have any loader on that front yet.

Jul 1 2021, 11:11 AM
zack requested changes to D5952: changelog: Reference first completion of sourceforge git/svn origins.
Jul 1 2021, 9:33 AM
zack added a comment to D5952: changelog: Reference first completion of sourceforge git/svn origins.

Thanks a lot for this!

Jul 1 2021, 9:32 AM
zack added a comment to T3418: Decide a consistent policy on having multiple archived objects for the same extid.

(3) should be ideally implemented in a way that guarantees that extid that were resolvable in previous versions of the mapping will always be resolvable in future versions

I don't understand. Option 3 is to remove relations between extids and SWHID, so it won't be resolvable anymore.

Jul 1 2021, 9:01 AM · Storage manager, Mercurial loader

Jun 30 2021

zack added a comment to T3418: Decide a consistent policy on having multiple archived objects for the same extid.

I've the feeling that option (1) will lead in the long run to an explosion on the size of the mapping which will make us eventually converge (slowly) toward option (3).

Jun 30 2021, 7:33 PM · Storage manager, Mercurial loader

Jun 21 2021

zack accepted D5899: swh-model: get SWHID from Content/Directory objects in from_disk.
Jun 21 2021, 4:53 PM
zack committed R183:6da3bade1513: add Demeyer, Mens book on Software Evolution (authored by zack).
add Demeyer, Mens book on Software Evolution
Jun 21 2021, 9:02 AM

Jun 19 2021

zack committed rDGRPHf885bdb0099e: git2graph: bugfix: traverse all nodes even when edges are not traversed (authored by zack).
git2graph: bugfix: traverse all nodes even when edges are not traversed
Jun 19 2021, 2:12 PM
zack committed rDGRPH3adff1c45b7a: tools/dir2graph: new tool to convert a local dir to nodes/edges files (authored by zack).
tools/dir2graph: new tool to convert a local dir to nodes/edges files
Jun 19 2021, 2:03 PM

Jun 18 2021

zack added a comment to D5899: swh-model: get SWHID from Content/Directory objects in from_disk.

I'll wait @vlorentz and @olasd before to write the unit test (in the case they want to use a different approach)

Jun 18 2021, 6:33 PM
zack updated subscribers of D5899: swh-model: get SWHID from Content/Directory objects in from_disk.

@olasd @vlorentz: I've noted down only nits and "classics" in the above review. If you want to chime in on the approach (where the methods are, properties, caching, etc.) please do!

Jun 18 2021, 5:13 PM
zack requested changes to D5899: swh-model: get SWHID from Content/Directory objects in from_disk.
Jun 18 2021, 5:10 PM
zack added a comment to D5899: swh-model: get SWHID from Content/Directory objects in from_disk.

LGTM in general, but needs unit tests (in addition to the two nitpicks above about docstrings)

Jun 18 2021, 5:10 PM
zack triaged T3393: add swhid() method to from_disk classes as Normal priority.
Jun 18 2021, 11:54 AM · Data Model

Jun 17 2021

zack committed rMSLD358de8ea3b7a: check in slides for GraphRM talk (authored by zack).
check in slides for GraphRM talk
Jun 17 2021, 3:12 PM
zack committed rMSLD40a6bd7d6e97: check-in (old) last-bits for telecom paris talk (authored by zack).
check-in (old) last-bits for telecom paris talk
Jun 17 2021, 3:12 PM
zack committed rDGRPH0068f61008e6: FindEarliestRevision: make timing optional with a dedidcated CLI flag (authored by zack).
FindEarliestRevision: make timing optional with a dedidcated CLI flag
Jun 17 2021, 10:47 AM

Jun 16 2021

zack committed R183:9ef36ff25c1f: add a bunch of entries about network studies on software (authored by zack).
add a bunch of entries about network studies on software
Jun 16 2021, 2:56 PM
zack committed R183:b6e967f148c4: add papers: mockus2009ammassing and gao2007archive (authored by zack).
add papers: mockus2009ammassing and gao2007archive
Jun 16 2021, 2:37 PM
zack added a comment to D5879: identify: Fix exclude_patterns parameter type for identify_object.

I think this also needs bumping the versioned dependency on swh-model (and a release of that).

Jun 16 2021, 12:22 PM

Jun 15 2021

zack triaged T3383: swh identify --recursive breaks --exclude, resulting in a "AttributeError: 'str' object has no attribute 'decode'" traceback as High priority.
Jun 15 2021, 4:48 PM · Data Model
zack updated subscribers of T3382: Save process seems to be stuck.

Thanks @ardumont for following up to this task.

Jun 15 2021, 4:27 PM · Save Code Now
zack closed D5551: Fix swh-scanner for python 3.7 and >= 3.8.

landed in d58bcb59a0999ae124de23db88fc9f73603d452a

Jun 15 2021, 11:11 AM
zack commandeered D5551: Fix swh-scanner for python 3.7 and >= 3.8.
Jun 15 2021, 11:11 AM
zack closed T3209: Fix swh-scanner for python > 3.7 as Resolved by committing rDTSCNd58bcb59a099: Fix swh-scanner for python 3.7 and >= 3.8.
Jun 15 2021, 11:10 AM · Code scanner
zack committed rDTSCNd58bcb59a099: Fix swh-scanner for python 3.7 and >= 3.8 (authored by aastha1999).
Fix swh-scanner for python 3.7 and >= 3.8
Jun 15 2021, 11:10 AM

Jun 11 2021

zack accepted D5825: swh-model: add recursive option.
Jun 11 2021, 2:54 PM
zack removed a reviewer for D5825: swh-model: add recursive option: vlorentz.
Jun 11 2021, 2:51 PM
zack requested changes to D5825: swh-model: add recursive option.
Jun 11 2021, 1:21 PM
zack added a project to T3374: Ingest sourceforge repositories (origins of type git, svn, hg): Archive coverage.

Note: when this is (reasonably) done, we should document the addition of SourceForge to the archive coverage page at archive.s.o and also to the archive changelog.

Jun 11 2021, 12:27 PM · System administration, Archive coverage, Origin-SourceForge
zack renamed T3349: use swh.model.merkle/from_disk instead of swh.scanner.model from consider using swh.model.merkle/from_disk instead of swh.scanner.model to use swh.model.merkle/from_disk instead of swh.scanner.model.
Jun 11 2021, 11:16 AM · Code scanner
zack added a subtask for T3349: use swh.model.merkle/from_disk instead of swh.scanner.model: T2730: scanner: should output the root SWHID as well.
Jun 11 2021, 11:16 AM · Code scanner
zack added a parent task for T2730: scanner: should output the root SWHID as well: T3349: use swh.model.merkle/from_disk instead of swh.scanner.model.
Jun 11 2021, 11:16 AM · Easy hack, Code scanner

Jun 10 2021

zack abandoned D5420: cli/identify: Add support for --recursive.

Superseded by D5825. Abandoning this one.

Jun 10 2021, 8:38 PM
zack commandeered D5420: cli/identify: Add support for --recursive.
Jun 10 2021, 8:38 PM
zack added a comment to D5825: swh-model: add recursive option.

LGTM, thanks !

Jun 10 2021, 8:38 PM
zack accepted D5825: swh-model: add recursive option.
Jun 10 2021, 8:37 PM

Jun 9 2021

zack requested changes to D5825: swh-model: add recursive option.
Jun 9 2021, 11:57 AM
zack added a comment to D5825: swh-model: add recursive option.

wouldn't it make sense to have a separate command (eg. recursive-identify instead of identify --recursive)?

Jun 9 2021, 10:08 AM

Jun 8 2021

zack added a comment to D5825: swh-model: add recursive option.

But what was the process before? Did it ignore directory entries?

It checks only the given directories generating a from_disk.Directory object for each directory. Should it uses the same logic used for the recursive option?

Jun 8 2021, 8:02 PM
zack added a comment to D5825: swh-model: add recursive option.
  1. If relevant, could you implement --verify too?

Sure, if it is useful i could open another diff for it

Jun 8 2021, 5:37 PM
zack added a project to T3350: Deploy sourceforge lister in production: Archive coverage.
Jun 8 2021, 11:45 AM · Archive coverage, System administration, Origin-SourceForge
zack added a comment to T3366: Improve the page rendering mechanism in the web UI.

I'm adding here a note about something to consider in terms of pros/cons: accessibility. As for the most part we are archiving textual information, we really want it to be accessible for all users. Right now we go further than that, ensuring that the Web UI can be browser with a textual browser: so, for instance, w3m https://archive.softwareheritage.org/swh:1:cnt:c839dea9e8e6f0528b468214348fee8669b305b2 just works out of the box. I'm not up to date on what's the accessibility impact of current JS frameworks, nor that we should have as a requirement that the archive is browsable without JavaScript enabled (as per today's standards "browsable with free javascript" is probably good enough for us, and we have a curl-able API anyway), but accessibility per se is definitely going to be a requirement.

Jun 8 2021, 11:19 AM · Web app
zack shifted T3366: Improve the page rendering mechanism in the web UI from the Restricted Space space to the S1 Public space.
Jun 8 2021, 11:13 AM · Web app
zack renamed T3366: Improve the page rendering mechanism in the web UI from Improve the page rendering mechanism in the web to Improve the page rendering mechanism in the web UI.
Jun 8 2021, 11:13 AM · Web app
zack triaged T3366: Improve the page rendering mechanism in the web UI as Normal priority.
Jun 8 2021, 11:13 AM · Web app

Jun 7 2021

zack added a comment to T3149: Benchmark software for the object storage.

how about just collecting all raw timings in an output CSV file (or several files if needed) and compute the stats downstream (e.g., with pandas)?
that would allow changing the percentiles later on as well as compute different stats, without having to rerun the benchmarks

Jun 7 2021, 3:21 PM · Object storage
zack renamed Save Code Now from SaveCodeNow to Save Code Now.
Jun 7 2021, 9:45 AM
zack triaged T3361: "Save code now" seems to be stuck as High priority.
Jun 7 2021, 9:44 AM · Save Code Now

Jun 4 2021

zack added a comment to D5816: loader: add an hg-specific mapping for branching.

Minor request, which can also be implemented in a subsequent commit, can we have the mapping documented somewhere? As a first approximation even a docstring would do, so that it will show up at docs.s.o. (Not sure if the files being modified here are the most relevant place for it though, it can also go at the root of the mercurial loader Python hierarchy, up to you !)

Jun 4 2021, 11:39 AM
zack resigned from D5816: loader: add an hg-specific mapping for branching.
Jun 4 2021, 11:36 AM

Jun 3 2021

zack added a comment to T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

That explains it, and it's good enough for me, thanks :)

Jun 3 2021, 2:30 PM · Mercurial loader
zack updated the task description for T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).
Jun 3 2021, 2:30 PM · Mercurial loader
zack added a comment to T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

My remaining question then is: how about, instead of branch-{tip,heads,closed-heads}/name we use branches/{heads,closed,tip}/name ?

Jun 3 2021, 2:20 PM · Mercurial loader
zack added a comment to T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

They are all branch heads (git "branch" are about heads too, bookmarks too), so a heads/ prefix does not bring much.

Jun 3 2021, 1:37 PM · Mercurial loader
zack added a comment to T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

My point here is for user looking at the structure to easily distinguish between the different mapping format. Something based on the "visit data" and associated documentation seems quite fragile.

Jun 3 2021, 1:11 PM · Mercurial loader

Jun 2 2021

zack requested changes to D5816: loader: add an hg-specific mapping for branching.

(i'm marking this as on hold until we have reached a decision on T3352, just to avoid this gets deployed by mistake. But feel free to go ahead with the rest of the review or even override, if you think there is a better safeguard to avoid this gets deployed)

Jun 2 2021, 8:06 PM

May 31 2021

zack added a comment to T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

Understood. To explain my thinking here, the refs/... structure is something we picked to represent git branch names as faithfully as possible, adding as little as possible on top of it. In trying to represent branch names from another VCS, as a first approximation I'd rather reuse the same *approach* than a *result* that is similar, if that makes sense. So, to pivot the question around, what is the minimal (also in the sense that it is shorter / has less cruft) naming scheme that would allow us to represent without ambiguity all the Mercurial naming aspects that you want to capture?

May 31 2021, 10:24 PM · Mercurial loader
zack added a comment to T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

Is the ability to recognize that a snapshot comes from Mercurial an actual goal here? I don't think we care about "clashes" between snapshot created from different VCS, but maybe I'm missing something.

May 31 2021, 9:03 PM · Mercurial loader

May 28 2021

zack changed the status of T3349: use swh.model.merkle/from_disk instead of swh.scanner.model from Open to Work in Progress.
May 28 2021, 11:13 AM · Code scanner