Page MenuHomeSoftware Heritage

zack (Stefano Zacchiroli)
UserAdministrator

User Details

User Since
Sep 7 2015, 3:43 PM (189 w, 2 d)
Roles
Administrator

Recent Activity

Yesterday

zack renamed T1691: metadata indexer: investigate metadata entries with empty mappings from metadata indexer: investigate empty mappings to metadata indexer: investigate metadata entries with empty mappings.
Wed, Apr 24, 5:21 PM · Archive content, Indexer
zack triaged T1691: metadata indexer: investigate metadata entries with empty mappings as Normal priority.
Wed, Apr 24, 5:20 PM · Archive content, Indexer
zack closed T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata as Resolved.

This is now done, aside from a minor issue noted below:

softwareheritage-indexer=# select count(*) from revision_intrinsic_metadata where metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::jsonb;
 count 
-------
     0
(1 row)
Wed, Apr 24, 5:18 PM · Archive content, Indexer
zack accepted D1432: Drop constraint on references of revision_metadata from origin_metadata..
Wed, Apr 24, 5:04 PM
zack added a comment to T1683: Decide where should be gathered specs.

I don't think it's a good idea to move specs from the specific components they specify to a different repository (swh-docs), because that will just increase the chances (which are already quite high) that they will get out of sync.

Wed, Apr 24, 3:14 PM · Development documentation
zack added a comment to D1430: Correct typo "scrapping" -> "scraping".

I knew about the manual landing from us but I am not a big fan of all this noise added to the commit message by arc.
Based on my understanding, if the original commit author is able to push after we accept a diff, this noise will not
be present. Anyway, that's not a blocker to integrate changes so I will also proceed like this in the future.

Wed, Apr 24, 2:26 PM
zack added a project to T1683: Decide where should be gathered specs: Development documentation.
Wed, Apr 24, 2:24 PM · Development documentation
zack triaged T1689: enable landing patches via the web UI for all repos as High priority.
Wed, Apr 24, 2:24 PM · Development environment, Phabricator
zack added a comment to D1430: Correct typo "scrapping" -> "scraping".

Third: everyone else: once a change is accepted, you can land it yourself by doing first arc diff D1430 and then arc land

I think you mean arc patch D1430 then arc land

Wed, Apr 24, 2:14 PM
zack committed rDDOC550b9d6e598f: CONTRIBUTORS: create file adding first external contributor (authored by zack).
CONTRIBUTORS: create file adding first external contributor
Wed, Apr 24, 2:08 PM
zack added a comment to D1430: Correct typo "scrapping" -> "scraping".

@mcv21: first, apologies for the conflicting communication. We don't have a lot of external contributors (yet), so our processes are not very oiled on that front (yet).

Wed, Apr 24, 2:05 PM
zack committed rDDOCd863304089b6: Correct typo "scrapping" -> "scraping" (authored by mcv21).
Correct typo "scrapping" -> "scraping"
Wed, Apr 24, 2:03 PM
zack committed rDDOCd146bfdc4ca2: Correct typo "scrapping" -> "scraping" (authored by mcv21).
Correct typo "scrapping" -> "scraping"
Wed, Apr 24, 2:03 PM
zack closed D1430: Correct typo "scrapping" -> "scraping".
Wed, Apr 24, 2:02 PM
zack added a comment to T1683: Decide where should be gathered specs.

The problem with having it in the docs, is that specs of higher level (like metadata workflow) must be in a specific repository, or can I have a specs folder in swh-environment?

Wed, Apr 24, 1:55 PM · Development documentation
zack added a project to T1685: move main website from www.s.o to s.o: System administration.

@zack, I tried to handle that task but I do not have access to domains configuration from the Gandi interface.

Wed, Apr 24, 11:08 AM · System administration, Website
zack added a comment to D1430: Correct typo "scrapping" -> "scraping".
In D1430#31457, @mcv21 wrote:

[I think I can't merge this myself]

Wed, Apr 24, 10:42 AM
zack committed rMSLD68a1d19e7498: reprod-bad-sota: add back original Collberg picture (authored by zack).
reprod-bad-sota: add back original Collberg picture
Wed, Apr 24, 10:25 AM
zack committed rMSLD5496f371c595: merkle tree slide: add vspace to fit into 16:9 (authored by zack).
merkle tree slide: add vspace to fit into 16:9
Wed, Apr 24, 10:25 AM

Tue, Apr 23

zack accepted D1430: Correct typo "scrapping" -> "scraping".
Tue, Apr 23, 6:20 PM
zack added a comment to T1683: Decide where should be gathered specs.

I need this decision for different specs, for example:

  • Legacy software deposit
  • Sparse / Metadata deposit (now in docs)
  • Metadata workflow- How do we dill with software metadata (T1344)
Tue, Apr 23, 5:46 PM · Development documentation
zack triaged T1685: move main website from www.s.o to s.o as Low priority.
Tue, Apr 23, 9:58 AM · System administration, Website

Sun, Apr 21

zack triaged T1682: https://softwareheritage.org does not redirect to https://www.s.o as High priority.
Sun, Apr 21, 7:56 AM · System administration

Fri, Apr 19

zack added a comment to D1415: Add support for client side rendering of Jupyter notebooks.

I think we need to have a conversation about how we decide to add support for rendering specific file formats in the webapp.
There are a million different file formats out there, why are we rendering Markdown and Jupyter notebooks and not something else?
When we have support for a million (or even just a dozen, really), are we going to test for which-is-which in a long chained series of IFs in a template? That doesn't seem really wise…
How do we decide if something is a given file format or not, especially considering that getting it wrong might even entail security vulnerabilities, injections, etc.? (because we're rendering content we do not control directly)

Fri, Apr 19, 7:49 PM
zack retitled D1427: setup: extract db and http related parts in dedicated optional 'extras' from logger: extract the PostresHandler in a dedicated module to logger: extract the PostgresHandler to a dedicated module.
Fri, Apr 19, 12:36 PM
zack added a comment to T1667: Fix confusing duplication of "Archive" links on the main website.

@anlambert: are you proposing to remove it from everywhere or only from www.s.o ?

Fri, Apr 19, 12:01 PM · Website, Unknown Object (Project)

Thu, Apr 18

zack added a project to T1667: Fix confusing duplication of "Archive" links on the main website: Website.
Thu, Apr 18, 8:40 PM · Website, Unknown Object (Project)
zack added a subtask for T1451: ingest GNU Savannah Git repositories: T1659: rewrite the CGit lister as a proper lister.
Thu, Apr 18, 10:38 AM · Archive coverage
zack added a parent task for T1659: rewrite the CGit lister as a proper lister: T1451: ingest GNU Savannah Git repositories.
Thu, Apr 18, 10:38 AM · CGit lister
zack triaged T1659: rewrite the CGit lister as a proper lister as Low priority.
Thu, Apr 18, 10:38 AM · CGit lister
zack renamed T1451: ingest GNU Savannah Git repositories from ingest savannah git repositories to ingest GNU Savannah Git repositories.
Thu, Apr 18, 10:03 AM · Archive coverage
zack updated the task description for T1651: Create a separate project for deposit-client.
Thu, Apr 18, 9:40 AM · SWH command line interface, SWORD deposit

Wed, Apr 17

zack accepted D1421: Change `url` and `external-id` from mandatory to optional metadata.
Wed, Apr 17, 11:50 AM
zack requested changes to D1421: Change `url` and `external-id` from mandatory to optional metadata.
Wed, Apr 17, 11:32 AM
zack added inline comments to D1421: Change `url` and `external-id` from mandatory to optional metadata.
Wed, Apr 17, 10:14 AM

Tue, Apr 16

zack triaged T1655: Web API: more user friendly answer when checking state of non existent save code now requests as Low priority.
Tue, Apr 16, 7:16 PM · Web app
zack triaged T1654: expired SSL certificate for archive.internal.softwareheritage.org as Normal priority.
Tue, Apr 16, 7:05 PM · System administration

Mon, Apr 15

zack added a comment to T1650: Generate xml for SWORD protocol with deposit client.

As an addendum to this: it should still be possible to provide an XML file with metadata for all but the simplest cases. What we want to optimize for is the case in which only the mandatory metadata are available and, in that case, offer a CLI alternative instead.

Mon, Apr 15, 3:00 PM · SWORD deposit

Sat, Apr 13

zack added a comment to T1241: Persistent identifiers (PIDs): add a way to describe Merkle DAG paths.

For file paths it would be nice to also support steps that use usual file/dir names foo/bar/baz, as a more readable alternative to number-based steps.

Sat, Apr 13, 4:52 PM · Web app, General
zack renamed T1241: Persistent identifiers (PIDs): add a way to describe Merkle DAG paths from Describing paths in the Merkle DAG to Persistent identifiers (PIDs): add a way to describe Merkle DAG paths.
Sat, Apr 13, 4:47 PM · Web app, General

Wed, Apr 10

zack updated the task description for T1638: Deposit: error when submitting through cli.
Wed, Apr 10, 2:41 PM · SWORD deposit

Tue, Apr 9

zack triaged T1628: replayer: non reliability when fetching items from kafka as Normal priority.
Tue, Apr 9, 9:59 AM · Journal

Sat, Apr 6

zack added a comment to T808: phabricator lister.

Now the problem is base url and the api token for each phabricator is different so I am not able to understand how to deal with this?
Can anyone please help me?

Sat, Apr 6, 6:03 PM · Easy hack, Phabricator forge

Thu, Apr 4

zack closed T1625: Repository with trailing slash are archived separately as Wontfix.

This is intended, because there is no guarantee that the Git repository accessible via an URL with a trailing slash will be the same of the one accessible at the same URL without the trailing slash. Same argument goes for all other examples you mention.

Thu, Apr 4, 7:58 PM · GitHub lister

Wed, Apr 3

zack updated the name of F3490352: codeplex-archive-sitemap.xml.xz from "sitemap.xml.xz" to "codeplex-archive-sitemap.xml.xz".
Wed, Apr 3, 10:05 AM
zack changed the visibility for F3490352: codeplex-archive-sitemap.xml.xz.
Wed, Apr 3, 10:05 AM
zack triaged T1623: ingest the Codeplex archive as Normal priority.
Wed, Apr 3, 10:04 AM · Archive coverage

Tue, Apr 2

zack updated the task description for T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Tue, Apr 2, 4:41 PM · Archive content, Indexer
zack updated the task description for T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Tue, Apr 2, 4:40 PM · Archive content, Indexer
zack updated the task description for T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Tue, Apr 2, 4:37 PM · Archive content, Indexer
zack updated the task description for T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Tue, Apr 2, 4:37 PM · Archive content, Indexer
zack added a project to T1603: kafka storage backfiller: Journal.
Tue, Apr 2, 2:44 PM · Journal, Sprint 2019 03
zack added a project to T1620: Sprint deliverable: deploy a kafka-in-the-loop mirror instance: System administration.
Tue, Apr 2, 2:43 PM · System administration, Sprint 2019 03
zack added a project to T1604: Improve kafka deployment: System administration.
Tue, Apr 2, 2:43 PM · System administration, Sprint 2019 03
zack added projects to T1621: Add metrics to the currently deployed kafka cluster: Metrics/monitoring, System administration.
Tue, Apr 2, 2:43 PM · System administration, Metrics/monitoring, Sprint 2019 03
zack retitled D1279: Move all backends to a dedicated swh.objstorage.backends sub-package from Move all backends in a dedicated swh.objstorage.backends sub-package to Move all backends to a dedicated swh.objstorage.backends sub-package.
Tue, Apr 2, 2:06 PM
zack closed T1619: Text overlapping logo. as Wontfix.

That is by design (although, admittedly, not really nice). More than "fixing" this specific issue, we might consider revamping/uniforming the general way of presenting sub-sites. But that is not gonna be just a fire-and-forget logo fix.

Tue, Apr 2, 12:55 PM · Web app
zack renamed T1603: kafka storage backfiller from kafka storage backfill to kafka storage backfiller.
Tue, Apr 2, 12:37 PM · Journal, Sprint 2019 03

Mon, Apr 1

zack committed rDSNIP789a301fdd32: swh-monthly-report: filter on committer date (authored by zack).
swh-monthly-report: filter on committer date
Mon, Apr 1, 3:22 PM
zack committed rDSNIP22f4ce181919: swh-monthly-report: helper script to draft monthly activity team reports (authored by zack).
swh-monthly-report: helper script to draft monthly activity team reports
Mon, Apr 1, 3:22 PM
zack committed rDSNIPbd77aa262c73: swhphab.py: do not crash when printing summary of repo-less diffs (authored by zack).
swhphab.py: do not crash when printing summary of repo-less diffs
Mon, Apr 1, 3:21 PM
zack committed rDSNIPb5db8812c05a: swhphab.py: include status when printing task summaries (authored by zack).
swhphab.py: include status when printing task summaries
Mon, Apr 1, 3:21 PM
zack committed rDSNIPb1f0b7ba0c6f: swh-weekly-report: preserve iterators and port to current Phabricator (authored by zack).
swh-weekly-report: preserve iterators and port to current Phabricator
Mon, Apr 1, 3:21 PM
zack committed rDSNIP9e6ad5fb0e51: swh-weekly-report: filter on committer date (authored by zack).
swh-weekly-report: filter on committer date
Mon, Apr 1, 3:21 PM
zack closed D1286: swh-monthly-report: helper script to draft monthly activity team reports.
Mon, Apr 1, 3:21 PM
zack committed rDSNIPf37fb0a243ed: swh-weekly-report: further refactoring/clean-up against swhphab.py (authored by zack).
swh-weekly-report: further refactoring/clean-up against swhphab.py
Mon, Apr 1, 3:21 PM
zack committed rDSNIP98373da2cd1f: swh-weekly-report: split generic code to swhphab.py (authored by zack).
swh-weekly-report: split generic code to swhphab.py
Mon, Apr 1, 3:21 PM
zack committed rDSNIP3e04e3eb2e46: swh-weekly-report: new helper to write weekly reports (authored by zack).
swh-weekly-report: new helper to write weekly reports
Mon, Apr 1, 3:21 PM
zack closed D1283: swh-weekly-report: new helper to write weekly reports.
Mon, Apr 1, 3:21 PM

Sun, Mar 31

zack edited projects for T1615: In the feature 'search' the check-boxes and text are not aligned., added: Web app; removed Website.
Sun, Mar 31, 2:22 PM · Web app
zack placed T1615: In the feature 'search' the check-boxes and text are not aligned. up for grabs.

To clarify: "assignment" of tasks to specific people is something only us, Software Heritage maintainers, do. Hence I'm de-assigning the issue from @01shobitha.

Sun, Mar 31, 2:20 PM · Web app

Fri, Mar 29

zack accepted D1316: Document how to write a metadata mapping..
Fri, Mar 29, 11:18 AM
zack requested changes to D1316: Document how to write a metadata mapping..

I've nitpicked only about section naming/intro and some style, the rest LGTM.

Fri, Mar 29, 10:36 AM

Thu, Mar 28

zack updated the diff for D1286: swh-monthly-report: helper script to draft monthly activity team reports.
  • swh-weekly-report: new helper to write weekly reports
  • swh-weekly-report: split generic code to swhphab.py
  • swh-weekly-report: further refactoring/clean-up against swhphab.py
  • swhphab.py: do not crash when printing summary of repo-less diffs
  • swhphab.py: include status when printing task summaries
  • swh-weekly-report: filter on committer date
  • swh-weekly-report: preserve iterators and port to current Phabricator
  • swh-monthly-report: helper script to draft monthly activity team reports
  • swh-monthly-report: filter on committer date
Thu, Mar 28, 10:04 PM
zack updated the diff for D1283: swh-weekly-report: new helper to write weekly reports.
Thu, Mar 28, 10:02 PM
zack updated the diff for D1283: swh-weekly-report: new helper to write weekly reports.
Thu, Mar 28, 10:02 PM
zack added inline comments to D1283: swh-weekly-report: new helper to write weekly reports.
Thu, Mar 28, 10:01 PM
zack updated the diff for D1283: swh-weekly-report: new helper to write weekly reports.
  • swh-weekly-report: preserve iterators and port to current Phabricator
Thu, Mar 28, 9:58 PM
zack removed a reviewer for D1295: prevent high memory usage: zack.
Thu, Mar 28, 9:53 PM
zack requested changes to D1295: prevent high memory usage.

Oups, sorry, didn't mean to accept this, only to remove myself from reviewers.
I'll let @anlambert finish the actual review.

Thu, Mar 28, 9:53 PM
D1295: prevent high memory usage is now accepted and ready to land.
Thu, Mar 28, 9:52 PM

Tue, Mar 26

zack updated subscribers of D1295: prevent high memory usage.
In D1295#27649, @zack wrote:

or, actually, we can just also add a fulltext index to URLs and be done with it https://www.postgresql.org/docs/11/textsearch-intro.html#TEXTSEARCH-MATCHING

Tue, Mar 26, 9:03 PM
zack added a comment to D1295: prevent high memory usage.
In D1295#27648, @zack wrote:

@anlambert given we have a trigram index on origin URLs, have you ever tried to use the various similarity operators document at https://www.postgresql.org/docs/11/pgtrgm.html instead of generating all possible permutations for regexs?
I'm assuming (probably too naively) that you can just do a big select on the URLs, sorting by similarity and possibly filtering on a threshold to return meaningful results. But it's not like I've actually tested it…

Tue, Mar 26, 8:44 PM
zack added a comment to D1295: prevent high memory usage.

@anlambert given we have a trigram index on origin URLs, have you ever tried to use the various similarity operators document at https://www.postgresql.org/docs/11/pgtrgm.html instead of generating all possible permutations for regexs?
I'm assuming (probably too naively) that you can just do a big select on the URLs, sorting by similarity and possibly filtering on a threshold to return meaningful results. But it's not like I've actually tested it…

Tue, Mar 26, 8:38 PM
zack requested changes to D1295: prevent high memory usage.
Tue, Mar 26, 7:50 PM

Mar 25 2019

zack triaged T1602: Analyze kakfa storage requirements as Normal priority.
Mar 25 2019, 3:07 PM · Sprint 2019 03
zack triaged T1599: Analyze objstorage's Azure updateness as Normal priority.
Mar 25 2019, 3:07 PM · Sprint 2019 03
zack triaged T1603: kafka storage backfiller as Normal priority.
Mar 25 2019, 3:07 PM · Journal, Sprint 2019 03
zack triaged T1601: Journal client of swh-storage mirrors as Normal priority.
Mar 25 2019, 3:07 PM · Sprint 2019 03
zack triaged T1600: Write a storage backend that writes to kafka as Normal priority.
Mar 25 2019, 3:07 PM · Sprint 2019 03
zack triaged T1604: Improve kafka deployment as Normal priority.
Mar 25 2019, 3:07 PM · System administration, Sprint 2019 03
zack added a reviewer for D1286: swh-monthly-report: helper script to draft monthly activity team reports: Reviewers.
Mar 25 2019, 10:29 AM
zack added a reviewer for D1283: swh-weekly-report: new helper to write weekly reports: douardda.
Mar 25 2019, 10:29 AM
zack updated the diff for D1283: swh-weekly-report: new helper to write weekly reports.
  • swh-monthly-report: helper script to draft monthly activity team reports
  • swh-monthly-report: filter on committer date
Mar 25 2019, 10:28 AM
zack updated the diff for D1283: swh-weekly-report: new helper to write weekly reports.
  • swh-weekly-report: filter on committer date
Mar 25 2019, 10:28 AM

Mar 24 2019

zack added a comment to T808: phabricator lister.

Sure, just go ahead: there is no need to "reserve" tasks as a prerequisite to work on them. Just submit a diff against the lister repo as a diff when you've something ready to review :-)

Mar 24 2019, 1:09 PM · Easy hack, Phabricator forge

Mar 22 2019

zack created D1286: swh-monthly-report: helper script to draft monthly activity team reports.
Mar 22 2019, 4:05 PM
zack updated the diff for D1283: swh-weekly-report: new helper to write weekly reports.
  • swhphab.py: do not crash when printing summary of repo-less diffs
  • swhphab.py: include status when printing task summaries
Mar 22 2019, 3:57 PM
zack updated the diff for D1283: swh-weekly-report: new helper to write weekly reports.
  • swh-weekly-report: further refactoring/clean-up against swhphab.py
Mar 22 2019, 10:33 AM
zack updated the diff for D1283: swh-weekly-report: new helper to write weekly reports.
  • swh-weekly-report: split generic code to swhphab.py
Mar 22 2019, 10:23 AM