Page MenuHomeSoftware Heritage

zack (Stefano Zacchiroli)
UserAdministrator

User Details

User Since
Sep 7 2015, 3:43 PM (252 w, 5 d)
Roles
Administrator

Recent Activity

Mon, Jul 6

zack closed T2477: Fabristanco DJ as Invalid.
Mon, Jul 6, 9:09 AM
zack closed T2476: Fabristanco DJ as Invalid.
Mon, Jul 6, 9:09 AM

Fri, Jul 3

zack triaged T2474: drop blake2 hashes as Normal priority.
Fri, Jul 3, 4:15 PM · Data Model, Storage manager

Thu, Jul 2

zack committed rMSLD121ad4127166: check-in slides for SoHeal 2020 keynote (authored by zack).
check-in slides for SoHeal 2020 keynote
Thu, Jul 2, 5:37 PM
zack added a comment to T2430: lookup ingested tarballs (or similar source code containers) by container checksum.

@civodul I wanted to raise the topic of storing container metadata (in the style of what tools like pristine-tar do) here too, so thanks for giving me the chance :-)
I agree it might be a technical solution, *but*, I'm not sure I see the point.
Didn't you agree that having a "lookup service" from tarball/container checksums to SWHIDs (the Software Heritage identifiers, that can then be used to lookup stuff in the archive) would be enough to satisfy distro needs?
If yes, then "archiving container metadata" could be replaced by simply having a way to add entries to the lookup table. And allowing distros to do so is option that we can explore. (Once the service exists, of course.)

Thu, Jul 2, 12:07 PM · Data Model

Tue, Jun 30

zack updated the task description for T1805: Public API v2.
Tue, Jun 30, 2:43 PM · meta-task, Web app
zack added a comment to T1805: Public API v2.

I suspect when this task was initially submitted we didn't have yet SWHID with qualifiers :)
From the point of view of the APIv2, given v1 was using only hashes, for feature parity we should indeed only need SWHID without qualifiers, i.e., "core" SWHIDs. (I'm gonna edit the task description to reflect that.) Thanks for noticing this!

Tue, Jun 30, 2:42 PM · meta-task, Web app
zack triaged T2470: link to API Terms of Service is broken as Normal priority.
Tue, Jun 30, 10:57 AM · Web app

Sat, Jun 27

zack triaged T2468: add to archive coverage page a breakdown of the number of origins per lister [instance] as Normal priority.
Sat, Jun 27, 1:24 PM · Web app

Wed, Jun 24

zack committed rMSLD137637d059f8: MSR topology slides: update citations (authored by zack).
MSR topology slides: update citations
Wed, Jun 24, 11:38 AM

Mon, Jun 22

zack committed rMSLDe77c229098a1: MSR topology talk: drop SWH branding (authored by zack).
MSR topology talk: drop SWH branding
Mon, Jun 22, 11:08 AM
zack added a comment to D3293: scanner: dashboard file visualization per directory path.

@zack How do you feel about the minified bootstrap code?

Mon, Jun 22, 10:31 AM
zack committed rSPSITEace6c64e9ff5: granet: create local user for Sebastiano Vigna (authored by zack).
granet: create local user for Sebastiano Vigna
Mon, Jun 22, 9:39 AM
zack closed D3323: granet: create local user for Sebastiano Vigna.
Mon, Jun 22, 9:39 AM
zack created D3323: granet: create local user for Sebastiano Vigna.
Mon, Jun 22, 9:32 AM

Fri, Jun 19

zack committed rMSLD1b132d90f3cd: MSR 2020 RR slides: improve wording of research goal (authored by zack).
MSR 2020 RR slides: improve wording of research goal
Fri, Jun 19, 4:32 PM
zack committed rMSLD81d77f1c87fc: check in draft slides for MSR 2020 RR talk (authored by zack).
check in draft slides for MSR 2020 RR talk
Fri, Jun 19, 2:22 PM
zack committed rMSLD4e1271d7fa3c: bibliography module: add MSR 2020 entries (authored by zack).
bibliography module: add MSR 2020 entries
Fri, Jun 19, 1:55 PM
zack committed rMSLDf5a544767a46: bibliography module: uniform formatting of existing entries (authored by zack).
bibliography module: uniform formatting of existing entries
Fri, Jun 19, 1:52 PM
zack committed rMSLD042b02e20ae0: biblatex talk: add .gitignore to ignore generated files (authored by zack).
biblatex talk: add .gitignore to ignore generated files
Fri, Jun 19, 1:52 PM
zack added a comment to T2459: skip exogenous branches when ingesting github/gitlab git repositories.

as a related data point, the current graph export code applies the following heuristic to decide which outbound edges from snapshot nodes to emit:

  • keep branch names starting with refs/heads/
  • keep branch names starting with refs/tags/
  • drop everything else
Fri, Jun 19, 1:35 PM · Git loader
zack updated the task description for T2459: skip exogenous branches when ingesting github/gitlab git repositories.
Fri, Jun 19, 9:55 AM · Git loader
zack triaged T2460: public journal of notable archiving policy changes as Normal priority.
Fri, Jun 19, 9:54 AM · General
zack triaged T2459: skip exogenous branches when ingesting github/gitlab git repositories as Normal priority.
Fri, Jun 19, 9:50 AM · Git loader

Wed, Jun 17

zack added a comment to D3309: api/throttling: Lift rate limit when user has special permission.

thumbs up for a dedicated permission for API throttling, thanks !

Wed, Jun 17, 6:06 PM
zack added a comment to T1352: ingest Guix (SD) packages.

@lewo it's used in our DB but also exposed in the swh-web UI in search results (and in the future it is going to be also be a field for user searches, so that you can search, e.g., "emacs" only in the list of packages archived from a given origin type).

Wed, Jun 17, 3:52 PM · Archive coverage
zack added a comment to D3293: scanner: dashboard file visualization per directory path.
  • I don't think adding an external dependency (the .css) is a good idea (it might change or disappear). But this file doesn't seem to have a license so we probably can't bundle it either (ping @zack )
Wed, Jun 17, 9:12 AM

Tue, Jun 16

zack added a comment to T2451: Archive Newsletter on the Software Heritage website.

Possible solutions are:

  • We create the newsletter for each supported languages and send it directly from WordPress through the Newsletter plugin.
Tue, Jun 16, 6:39 PM · Unknown Object (Project)
zack added a comment to T1352: ingest Guix (SD) packages.
In T1352#45459, @lewo wrote:

So, we can now consider the sources.json file format as stable and you could make the required changes on your sources.json file. A new SHW origin should then be added.

Tue, Jun 16, 6:34 PM · Archive coverage

Sun, Jun 14

zack edited P692 graph export 2020-05-20 - statistics.
Sun, Jun 14, 3:16 PM
zack edited P692 graph export 2020-05-20 - statistics.
Sun, Jun 14, 1:15 PM
zack edited P692 graph export 2020-05-20 - statistics.
Sun, Jun 14, 1:11 PM
zack updated the language for P692 graph export 2020-05-20 - statistics from autodetect to rst.
Sun, Jun 14, 1:09 PM
zack updated the title for P692 graph export 2020-05-20 - statistics from untitled to graph export 2020-05-20 - statistics.
Sun, Jun 14, 1:09 PM
zack renamed T2430: lookup ingested tarballs (or similar source code containers) by container checksum from A swhid for archives to lookup ingested tarballs (or similar source code containers) by container checksum.
Sun, Jun 14, 8:57 AM · Data Model
zack triaged T2430: lookup ingested tarballs (or similar source code containers) by container checksum as Low priority.
Sun, Jun 14, 8:56 AM · Data Model
zack added a comment to T2430: lookup ingested tarballs (or similar source code containers) by container checksum.

Making explicit a direct answer to one of @lewo's question (hinted at by both @olasd and @rdicosmo): no, we do not want a new type of SWHID (swh:1:tar:...) for source code containers, which from our point of view are ephemeral.

Sun, Jun 14, 8:56 AM · Data Model
zack added a comment to T2449: Consider switching timestamp offset storage to strings/byte arrays.

Yeah, for having played with it quite a bit in recent times, the current state of timestamp offsets isn't great. I'm fine with the idea of switching them to bytestrings as proposed.

Sun, Jun 14, 8:49 AM · Storage manager, Data Model

Fri, Jun 12

zack added a comment to T2450: Fix pagination of the /revision/<rev>/log/ public API.

Another option is to simply drop this method from the public Web API, and keep the revision graph visit logic only in swh-web (the UI). If users want to do a full visit of the revision graph they can use /revision and implement the visit policy they want. (I've suggested this design consideration for API v2 in T1805.)

Fri, Jun 12, 10:02 PM · Web app
zack updated the task description for T1805: Public API v2.
Fri, Jun 12, 5:43 PM · meta-task, Web app
zack updated the task description for T1805: Public API v2.
Fri, Jun 12, 5:43 PM · meta-task, Web app

Jun 9 2020

zack edited P691 random-swhid.
Jun 9 2020, 10:47 AM
zack edited P691 random-swhid.
Jun 9 2020, 10:34 AM
zack edited P691 random-swhid.
Jun 9 2020, 10:31 AM

Jun 8 2020

zack renamed T2335: automate handling of hanging/dead/stuck loaders from Automate handling hanging or dead loaders to automate handling of hanging/dead/stuck loaders.
Jun 8 2020, 2:23 PM · Scheduling utilities

Jun 5 2020

zack added a comment to T2435: Prepare support of new hashing algorithms for browsing objects.

FWIW we have discussed already a related aspect in T1805 ("Use SWH PIDs whenever possible"). There it was only for the Web API, but it seems wise to do the same for /browse/ URLs too.

Jun 5 2020, 3:01 PM · Web app
zack added a comment to T2435: Prepare support of new hashing algorithms for browsing objects.

The webapp then enables to browse a content from each computed checksum using the following URL: /browse/content/(algo):(hash)/

Jun 5 2020, 3:00 PM · Web app

Jun 4 2020

zack committed rMSLDed91373c7711: biblio module: update entry for 2020 ESE paper (authored by zack).
biblio module: update entry for 2020 ESE paper
Jun 4 2020, 9:11 AM
zack committed rMSLD47b09d81a2a6: team picture: avoid name clashes between 2016 and 2019 pictures (authored by zack).
team picture: avoid name clashes between 2016 and 2019 pictures
Jun 4 2020, 9:11 AM

Jun 3 2020

zack renamed T2431: Document how to export the graph edge dataset from Documentat how to export the graph edge dataset to Document how to export the graph edge dataset.
Jun 3 2020, 4:36 PM · Development documentation, Graph service, Datasets
zack changed the status of T1796: Datasets exported from Spark are missing some rows from Resolved to Wontfix.
Jun 3 2020, 4:20 PM · Datasets

May 30 2020

zack updated the task description for T2429: web UI: empty repository reported as 404 snapshot not found.
May 30 2020, 2:08 PM · Web app
zack triaged T2429: web UI: empty repository reported as 404 snapshot not found as Low priority.
May 30 2020, 2:06 PM · Web app

May 28 2020

zack committed rMSLDa1a346ffc68f: SWH in a nutshell slide: highlight key points (authored by zack).
SWH in a nutshell slide: highlight key points
May 28 2020, 11:54 AM
zack committed rMSLD57e13f95414d: graph compression module: review, improving presentation (authored by zack).
graph compression module: review, improving presentation
May 28 2020, 11:54 AM
zack committed rMSLD89ef710c075a: LIP6 talk: review, last touches (authored by zack).
LIP6 talk: review, last touches
May 28 2020, 11:54 AM
zack committed rMSLD95e9f83d5840: data flow: enlarge picture to fit slide (authored by zack).
data flow: enlarge picture to fit slide
May 28 2020, 11:54 AM

May 27 2020

zack edited P682 ant ivy bs.
May 27 2020, 3:52 PM
zack committed rMSLDcf7d96f8e6ce: check-in first draft of LIP6 talk slides (authored by zack).
check-in first draft of LIP6 talk slides
May 27 2020, 11:02 AM
zack committed rMSLDdc4a5438bd80: Merge branch 'master' of ssh://forge.softwareheritage.org/diffusion/64/slides (authored by zack).
Merge branch 'master' of ssh://forge.softwareheritage.org/diffusion/64/slides
May 27 2020, 11:01 AM
zack committed rMSLDc6ea44cbc4ea: status module: refresh growth graphs (authored by zack).
status module: refresh growth graphs
May 27 2020, 11:01 AM
zack committed rMSLD8c948f61ec9e: status module: update archive coverage slides to match current sources (authored by zack).
status module: update archive coverage slides to match current sources
May 27 2020, 11:01 AM
zack closed T2425: Missing twitter icon in footer of main website as Invalid.

duplicate of T2420

May 27 2020, 9:57 AM · Website, Unknown Object (Project)

May 26 2020

zack changed the visibility for F3887038: missing-twitter-icon.png.
May 26 2020, 4:21 PM
zack triaged T2420: website: Twitter icon missing in footer (missing font?) as Low priority.
May 26 2020, 4:21 PM · Website

May 24 2020

zack edited P680 Masterwork From Distant Lands.
May 24 2020, 3:31 PM

May 22 2020

zack committed rMSLDac4ec212f5db: ENSEA talk: (belated) check-in of last bits) (authored by zack).
ENSEA talk: (belated) check-in of last bits)
May 22 2020, 12:02 PM

May 20 2020

zack changed the status of T2335: automate handling of hanging/dead/stuck loaders from Open to Work in Progress.
May 20 2020, 9:42 AM · Scheduling utilities

May 19 2020

zack archived P678 Masterwork From Distant Lands.
May 19 2020, 4:14 PM
zack edited P678 Masterwork From Distant Lands.
May 19 2020, 4:14 PM
zack updated the title for P677 timezone offsets (in minutes) present in the archive that cannot be represented using postgres timestamptz from Masterwork From Distant Lands to timezone offsets (in minutes) present in the archive that cannot be represented using postgres timestamptz.
May 19 2020, 3:11 PM
zack edited P677 timezone offsets (in minutes) present in the archive that cannot be represented using postgres timestamptz.
May 19 2020, 3:07 PM
zack renamed T682: Ingest Google Code Mercurial repositories from Inject Google Code Mercurial repositories to Ingest Google Code Mercurial repositories.
May 19 2020, 9:56 AM · Archive coverage, Mercurial loader

May 18 2020

zack reopened T1832: create a mailing list swh-user-announce as "Open".

New use case for this mailing list:
announcing changes like in T2398

May 18 2020, 6:14 PM · SWORD deposit
zack renamed T1832: create a mailing list swh-user-announce from Create mailing list swh-maintenance for SWH clients to create a mailing list swh-user-announce.
May 18 2020, 6:13 PM · SWORD deposit

May 16 2020

zack added a comment to D3154: Add artifact metadata to the extrinsic metadata storage specification..

for the context, we need to use the SHWIDs themselves, not the sha1_git that is bound to version 1 of SWHIDS

What's the difference? When we'll use something other than sha1_git we'll just migrate this data with the rest.

May 16 2020, 4:23 PM

May 14 2020

zack committed rDMODcce303663432: SWHID spec: fix typos ";;" which made some examples fail (authored by zack).
SWHID spec: fix typos ";;" which made some examples fail
May 14 2020, 3:47 PM
zack added a comment to D3154: Add artifact metadata to the extrinsic metadata storage specification..

Question: can we support both context-full and context-less metadata for arbitrary artifacts?

May 14 2020, 3:15 PM

May 13 2020

zack triaged T2404: somerset: please mount /srv/softwareheritage/scratch as Wishlist priority.
May 13 2020, 10:27 AM · System administration

May 12 2020

zack committed rDGRPH17bfcd22007e: git2graph: use SWHID instead of PID in doc and coding conventions (authored by zack).
git2graph: use SWHID instead of PID in doc and coding conventions
May 12 2020, 11:32 AM
zack committed rDGRPHaedf59343514: git2graph: avoid clash with recent ligit2 GIT_OBJ_ANY define (authored by zack).
git2graph: avoid clash with recent ligit2 GIT_OBJ_ANY define
May 12 2020, 11:32 AM

May 11 2020

zack triaged T2402: make status.s.o status discoverable from archive.s.o as Normal priority.
May 11 2020, 9:32 PM · System administration, Web app

May 10 2020

zack triaged T2400: ingest Ubuntu as Normal priority.
May 10 2020, 8:43 AM · Archive coverage

May 7 2020

zack added a comment to T2396: Create links from Wikidata to SWH origins.

Given Wikidata software properties already contains "source code repository" URLs, shouldn't we work on our side to make sure those URLs resolve to corresponding origin URLs, rather than adding yet another property (very similar to "source code repository") on their side?

May 7 2020, 11:31 PM · Metadata workflow, Wikidata
zack committed rDSTO8412d4907f49: swh-schema.sql: improve comments on revision columns (authored by zack).
swh-schema.sql: improve comments on revision columns
May 7 2020, 4:54 PM
zack committed rDSTO74cffb7c4c69: swh-schema.sql: improve comments on revision columns (authored by zack).
swh-schema.sql: improve comments on revision columns
May 7 2020, 4:54 PM
zack closed D2825: swh-schema.sql: improve comments on revision columns.
May 7 2020, 4:54 PM

Apr 30 2020

zack created D3108: SWHID spec: full reread.
Apr 30 2020, 4:32 PM
zack accepted D3097: Partially revert "Refactor the sphinx-dev build environment to get rid of the link-stamp step".
Apr 30 2020, 10:04 AM
zack triaged T2386: docs.s.o build broken, lots of 404, no TOC as Unbreak Now! priority.
Apr 30 2020, 8:22 AM · Development documentation

Apr 29 2020

zack committed rDLDHG6f7bf7ab9edd: setup.py: make black and flake8 get along (authored by zack).
setup.py: make black and flake8 get along
Apr 29 2020, 6:47 PM
zack committed rDOBJSRPL9b84ee332e32: setup.py: fix template in Source URL (authored by zack).
setup.py: fix template in Source URL
Apr 29 2020, 6:38 PM
zack committed rDLDHG834ae5dc4bf4: setup.py: add documentation link (authored by zack).
setup.py: add documentation link
Apr 29 2020, 6:37 PM
zack committed rDJNL01fd3f90b2c4: setup.py: add documentation link (authored by zack).
setup.py: add documentation link
Apr 29 2020, 6:34 PM
zack committed rDWAPPSc8bca2af3584: setup.py: add documentation link (authored by zack).
setup.py: add documentation link
Apr 29 2020, 6:34 PM
zack committed rDVAU88f861c4e63c: setup.py: add documentation link (authored by zack).
setup.py: add documentation link
Apr 29 2020, 6:34 PM
zack committed rDSTO2b95dd331bd9: setup.py: add documentation link (authored by zack).
setup.py: add documentation link
Apr 29 2020, 6:33 PM
zack committed rDSEA1bcecfa47359: setup.py: add documentation link (authored by zack).
setup.py: add documentation link
Apr 29 2020, 6:33 PM
zack committed rDSCH2cc8aa04b40a: setup.py: add documentation link (authored by zack).
setup.py: add documentation link
Apr 29 2020, 6:33 PM
zack committed rDTSCN49e320b800d9: setup.py: add documentation link (authored by zack).
setup.py: add documentation link
Apr 29 2020, 6:33 PM