Page MenuHomeSoftware Heritage

zack (Stefano Zacchiroli)
UserAdministrator

User Details

User Since
Sep 7 2015, 3:43 PM (197 w, 2 d)
Roles
Administrator

Recent Activity

Yesterday

zack resigned from D1610: swh.lister.cgit.
Wed, Jun 19, 11:44 PM
zack accepted D1590: Add comments to few columns in dbversion, task and task_run.
Wed, Jun 19, 5:56 PM
zack accepted D1582: Add comments to tables dbversion, content, skipped_content and fetch_history.
Wed, Jun 19, 5:55 PM
zack requested changes to D1590: Add comments to few columns in dbversion, task and task_run.

minor caseness issue

Wed, Jun 19, 3:53 PM
zack requested changes to D1582: Add comments to tables dbversion, content, skipped_content and fetch_history.

Almost there !
(and thanks a lot for your persistence on this one)

Wed, Jun 19, 3:52 PM
zack accepted D1611: pathslicing: Make sure data is flushed to disk before renaming the tempfile.
Wed, Jun 19, 3:44 PM
zack added a comment to D1611: pathslicing: Make sure data is flushed to disk before renaming the tempfile.

LGTM.

Wed, Jun 19, 3:44 PM
zack requested changes to D1611: pathslicing: Make sure data is flushed to disk before renaming the tempfile.
Wed, Jun 19, 2:44 PM
zack requested changes to D1610: swh.lister.cgit.
Wed, Jun 19, 2:40 PM
zack added a comment to T1738: Define and specify extrinsic origin metadata.

Thanks a lot for this recap Morane !

Wed, Jun 19, 2:33 PM · Metadata workflow
zack added a comment to T1832: Create mailing list swh-maintenance for SWH clients.

There are two separate use cases here, so I'll comment on them separately:

Wed, Jun 19, 12:23 PM · SWORD deposit

Tue, Jun 18

zack triaged T1823: make DB/FS transactions nest properly as High priority.
Tue, Jun 18, 12:38 PM · Object storage, Storage manager

Mon, Jun 17

zack added a comment to T1659: rewrite the CGit lister as a proper lister.

Thanks for your interest in working on this @nahimilega , it would be very useful to move forward on a bunch of pending ingestions, including Tor !

Mon, Jun 17, 10:01 PM · CGit lister
zack closed T239: preserve at least 2 copies of each content object as Resolved.

resolved (by T691)

Mon, Jun 17, 4:45 PM · General
zack added a comment to T691: complete object storage mirror on Azure (meta task).
In T691#33551, @olasd wrote:

After processing the logs of the backfilling process to make sure to redo all the ranges that were interrupted in various database migrations, I'm now confident that this task is complete: we have a full mirror of all contents on Azure, which is kept up to date by the main archive storage backend writing synchronously to it.

Mon, Jun 17, 4:45 PM · General
zack added a comment to T1815: Use a FOSS alternative or drop Google ReCAPTCHA use.

Getting rid of ReCaptcha for save code now LGTM too.
I just wasn't sure that rate limit applies to Web UI submissions (e.g., will API requests come from our own IP? and if so, is that whitelisted?); I'm assuming that is what @anlambert plans to check.

Mon, Jun 17, 4:34 PM · Web app

Fri, Jun 14

zack requested changes to D1582: Add comments to tables dbversion, content, skipped_content and fetch_history.
Fri, Jun 14, 3:28 PM
zack added a comment to T1789: batch API to check for the presence of content in the archive.

Can we have the feature which will return the content of File Type, Language Type, and License not its URL

Fri, Jun 14, 1:13 PM · Web app
zack closed T1804: Software Heritage api to accept batch request from FOSSology as Invalid.

Hi @sandipbhuyan , I had in fact already created a task for this, it's: T1789

Fri, Jun 14, 12:06 PM

Thu, Jun 13

zack updated the task description for T1801: List all origins from major phabricator instances.
Thu, Jun 13, 10:09 AM · Lister
zack renamed T1801: List all origins from major phabricator instances from List major phabricator instances to list all origins from major phabricator instances.
Thu, Jun 13, 10:08 AM · Lister

Wed, Jun 12

zack added a comment to T1799: ingest Tor git repositories.

btw, the list is ~400 repos for now

Wed, Jun 12, 11:11 PM · Archive coverage
zack added a comment to T1799: ingest Tor git repositories.

@anarcat please hold off from using save code now for now. As we're planning to have a proper cgit lister, we can just add your instance to your rotation once that's done (unless this is super urgent, that is). That will have the additional advantage that we will automatically notice when new repos show up.

Wed, Jun 12, 11:10 PM · Archive coverage
zack added a comment to T1800: gitweb lister.

It's not really related, because gitweb and cgit are two different things.

Wed, Jun 12, 6:28 PM · Lister
zack triaged T1800: gitweb lister as Normal priority.
Wed, Jun 12, 5:21 PM · Lister
zack triaged T1799: ingest Tor git repositories as Normal priority.
Wed, Jun 12, 5:20 PM · Archive coverage
zack triaged T1798: ingest Tor project source code (meta task) as Normal priority.
Wed, Jun 12, 5:19 PM · Archive coverage
zack created T1798: ingest Tor project source code (meta task).
Wed, Jun 12, 5:19 PM · Archive coverage
zack added a comment to T1389: Implement a base loader for package managers.

Thanks @olasd, @ardumont, and @anlambert for this, it's a great plan and I like it a lot !

Wed, Jun 12, 1:58 PM · Origin-npm, Origin-Pypi, Archive coverage
zack requested changes to D1509: Write a specification of extrinsic origin metadata storage..

Thanks @vlorentz for this first draft. In spite of all the comments above, I think it's a very good start.

Wed, Jun 12, 1:31 PM
zack added a comment to T1411: reach a minimum of 80% SLOC coverage across all components.

The most recent update of the state of this task has shown a regression in the journal test coverage, which, per se, is not a big deal (just a few points). But it does raise the question of how, once we have attained whatever "minimum" coverage we are OK with, we monitor overtime that there is no regression. For instance, I think that code reviews should show to the reviewers how the submitted diff affects code coverage. Ideally, reviewers should be able to so if it has a net positive or negative effect on coverage, and take that into account in their review decisions. (Which is not to say we should never accept diffs that decrease code coverage—there might be reasons to do so. But it is a data point that would be useful for reviewers to see.)

Wed, Jun 12, 12:25 PM · Development environment, Sprint 2018 12
zack updated the task description for T1411: reach a minimum of 80% SLOC coverage across all components.
Wed, Jun 12, 12:23 PM · Development environment, Sprint 2018 12

Fri, Jun 7

zack updated the task description for T735: SourceForge lister.
Fri, Jun 7, 9:16 PM · Origin-SourceForge
zack triaged T1791: Web API: do not leak internal, non-intrinsic origin identifiers as Low priority.
Fri, Jun 7, 3:38 PM · Web app
zack changed the visibility for F3533133: haltode.pub.
Fri, Jun 7, 10:50 AM
zack triaged T1789: batch API to check for the presence of content in the archive as Normal priority.
Fri, Jun 7, 10:44 AM · Web app

Wed, Jun 5

zack claimed T1742: graph dataset: uniform file names.
Wed, Jun 5, 10:07 AM · Datasets
zack closed T1742: graph dataset: uniform file names as Resolved.
Wed, Jun 5, 10:07 AM · Datasets

Tue, Jun 4

zack closed T1783: edge dataset: re-export rev→rev edges in the right order as Resolved.
Tue, Jun 4, 10:33 PM · Datasets
zack added a comment to D1534: Add blake2s256 hash in the output of directory_ls..

Do we really want to do this?

Tue, Jun 4, 7:57 PM
zack committed rMSLDeeebf8f49bc5: check-in draft slides for sharing & reuse conf (authored by zack).
check-in draft slides for sharing & reuse conf
Tue, Jun 4, 5:47 PM
zack committed rMSLDd73e67353b47: swh-as-infra module: fix vertical spacing to avoid overflow (authored by zack).
swh-as-infra module: fix vertical spacing to avoid overflow
Tue, Jun 4, 5:47 PM
zack added a comment to T1731: Intrinsic identifiers for origins.

Just a couple of comments:

  • the current proposal is ori instead of org as 3-letter stem
  • your use cases are all valid, but would equally work with a full URL and with a hashed URL
Tue, Jun 4, 2:40 PM · Data Model
zack triaged T1783: edge dataset: re-export rev→rev edges in the right order as High priority.
Tue, Jun 4, 2:33 PM · Datasets
zack accepted D1460: Graph compression REST API proposal.
Tue, Jun 4, 2:07 PM

Thu, May 30

zack requested changes to D1460: Graph compression REST API proposal.
Thu, May 30, 10:33 AM

Wed, May 29

zack added a comment to T1731: Intrinsic identifiers for origins.

Okay then. I'll work on updating the identifier specification.

So, again, what are the remaining issues that inhibits you to just go ahead and use URI hashes as Cassandra origin IDs?

Those I listed above, which were more "philosophical" than technical. I started implementing it last Monday anyway, and it looks good.

Wed, May 29, 6:04 PM · Data Model
zack added a comment to T1774: Create a lister for x.org.

I don't like the idea of this lister.

Wed, May 29, 11:47 AM · Archive coverage
zack added a comment to T1731: Intrinsic identifiers for origins.

In this case, we'll also need to have an identifier for URL + type, if they want to cite/link to the non-default one.
We could use the "contextual information" mechanism, eg. swh:1:ori:SHA1;type=git

Wed, May 29, 11:44 AM · Data Model

Tue, May 28

zack added a comment to T1731: Intrinsic identifiers for origins.

This sounds like a good idea.
But it has some weird implications on components that use the concept of "origin head" (web UI and metadata indexers); because they'll use radically different content depending on which loader visited last.
But having two VCSs at the same URL is weird in itself, so 🤷

Tue, May 28, 5:09 PM · Data Model
zack added a comment to T1234: Allow simple read-only connections to db from swh nodes.

[ moving here my feedback from D1516 ]

Tue, May 28, 4:59 PM · System administration
zack added a comment to D1516: Add dbreplica CNAME.
In D1516#34010, @olasd wrote:

I'm not convinced this is such a good idea; this machine is way more than a "db replica" server (it only has one replica, most its databases are actually primary) and I don't think DNS provides the appropriate granularity level to record this information.

Tue, May 28, 3:36 PM · Staff

Sun, May 26

zack updated subscribers of T1759: add MSR 2019 dataset paper to the publications page.
Sun, May 26, 5:36 PM · Website
zack triaged T1760: add BENEVOL 2018 paper to the publications page as Normal priority.
Sun, May 26, 5:35 PM · Website
zack triaged T1759: add MSR 2019 dataset paper to the publications page as Normal priority.
Sun, May 26, 5:33 PM · Website

Sat, May 25

zack renamed T1411: reach a minimum of 80% SLOC coverage across all components from at least 80% SLOC coverage in all components to reach a minimum of 80% SLOC coverage across all components.
Sat, May 25, 5:39 PM · Development environment, Sprint 2018 12
zack changed the status of T1704: increase object storage capacity from Open to Work in Progress.
Sat, May 25, 5:39 PM · System administration
zack renamed T1704: increase object storage capacity from Increase storage capacity of the objstorage to increase object storage capacity.
Sat, May 25, 5:38 PM · System administration
zack renamed T1686: visual corporate identity (charte graphique) from To write visual corporate identity (charte graphique) to visual corporate identity (charte graphique).
Sat, May 25, 5:38 PM · Unknown Object (Project)
zack added a comment to T1378: Ingest npm into the Software Heritage archive (meta task).

I think the only thing missing here is adding the NPM logo to the archive coverage page.

Sat, May 25, 5:37 PM · Origin-npm, Archive coverage
zack added a comment to T1629: Deploy the npm ingestion in production.

I think this is now done, right @anlambert ?

Sat, May 25, 5:37 PM · Origin-npm
zack added a comment to T1411: reach a minimum of 80% SLOC coverage across all components.

only 3% to go in -lister and -core \o/

Sat, May 25, 5:36 PM · Development environment, Sprint 2018 12
zack updated the task description for T1411: reach a minimum of 80% SLOC coverage across all components.
Sat, May 25, 5:35 PM · Development environment, Sprint 2018 12
zack closed T1418: Loaders as Invalid.

these catch-all meta-tasks that will grow forever are not terribly useful, the individual tasks + their tasks should be enough

Sat, May 25, 5:32 PM · Sprint 2018 12
zack renamed T1475: Add more tests for edge cases of indexer mappings from Add more tests for edge cases of indexer mappings. to Add more tests for edge cases of indexer mappings.
Sat, May 25, 5:31 PM · Indexer
zack added a project to T1527: Have comments on all columns of all databases: Development documentation.
Sat, May 25, 5:30 PM · Easy hack, Development documentation, Storage manager, Scheduling utilities, Indexer
zack renamed T1581: Deposit: improvements from Deposit: improvments to Deposit: improvements.
Sat, May 25, 5:30 PM · SWORD deposit
zack closed T1581: Deposit: improvements as Resolved.
Sat, May 25, 5:29 PM · SWORD deposit
zack added a comment to T1581: Deposit: improvements.

closing this catch-all meta task, the individual sub tasks are clear enough

Sat, May 25, 5:29 PM · SWORD deposit
zack closed T1658: Gather CLI tools spread in SWH packages, a subtask of T1671: Implement a uniform and consistent cli to be used by all SWH packages, as Resolved.
Sat, May 25, 5:28 PM · SWH command line interface
zack closed T1658: Gather CLI tools spread in SWH packages as Resolved.

looks like this is done now, as you're deep in the implementation already! closing

Sat, May 25, 5:28 PM · SWH command line interface
zack closed T1329: Move rate limit processing to the Varnish layer ? as Wontfix.

We've discussed this back then, and decided in the end to leave it at the Django layer. Closing.

Sat, May 25, 5:27 PM · Web app
zack added a comment to T1338: Change BBUs on orsay.

has this been completed since?

Sat, May 25, 5:26 PM · System administration
zack added a comment to T1234: Allow simple read-only connections to db from swh nodes.

can we haz this, please? :)

Sat, May 25, 5:25 PM · System administration
zack closed T1180: add munin monitoring of snapshot objects count as Resolved.

snapshot count is now there, closing

Sat, May 25, 5:25 PM · Metrics/monitoring, System administration
zack added a comment to T1090: SSH timeouts on new Azure VMs.

is this still the case?

Sat, May 25, 5:23 PM · System administration
zack renamed T1002: ingest Hackage, the Haskell package repository (meta task) from ingest Hackage (Haskell package repository) into the Software Heritage archive (meta task) to ingest Hackage, the Haskell package repository (meta task).
Sat, May 25, 5:22 PM · Archive coverage
zack added a parent task for T881: PostgreSQL backups based on pg_dump: T880: Add softwareheritage-indexer backup.
Sat, May 25, 5:21 PM · System administration
zack added a subtask for T880: Add softwareheritage-indexer backup: T881: PostgreSQL backups based on pg_dump.
Sat, May 25, 5:21 PM · System administration
zack removed a subtask for T881: PostgreSQL backups based on pg_dump: T880: Add softwareheritage-indexer backup.
Sat, May 25, 5:21 PM · System administration
zack removed a parent task for T880: Add softwareheritage-indexer backup: T881: PostgreSQL backups based on pg_dump.
Sat, May 25, 5:21 PM · System administration
zack closed T752: logrotate log spam: "error: error creating output file: FILE File exists" as Resolved.

this hasn't happened for a long while

Sat, May 25, 5:18 PM · System administration
zack added a comment to T880: Add softwareheritage-indexer backup.

Just checking in on this, are we are discussing moving DBs around. Do we still don't have a backup for the indexer DB?
If so, priority of this one should probably be raised.

Sat, May 25, 5:17 PM · System administration
zack closed T917: Git loader: update README for YAML-based syntax, a subtask of T1388: Document the configuration system of each component, as Resolved.
Sat, May 25, 5:15 PM · Development documentation
zack closed T917: Git loader: update README for YAML-based syntax as Resolved.

This is done, I've forked off the part about consistently documenting configuration options to T1758.

Sat, May 25, 5:15 PM · Git loader, Development documentation
zack triaged T1758: consistently document the configuration option of each module as Normal priority.
Sat, May 25, 5:14 PM · Development documentation
zack updated the task description for T917: Git loader: update README for YAML-based syntax.
Sat, May 25, 5:13 PM · Git loader, Development documentation
zack added a project to T1362: Upgrade the Prometheus setup to Thanos : System administration.
Sat, May 25, 5:10 PM · System administration, Metrics/monitoring
zack renamed T1377: in-memory storage: compute all counters from in-memory storage: compute all counters. to in-memory storage: compute all counters.
Sat, May 25, 5:10 PM · Easy hack, Storage manager
zack closed T41: add actual content to swh-storage-testdata as Invalid.

swh-storage-testdata is gone, closing

Sat, May 25, 5:08 PM · Storage manager
zack closed Unknown Object (Maniphest Task), a subtask of T691: complete object storage mirror on Azure (meta task), as Resolved.
Sat, May 25, 5:06 PM · General
zack added a parent task for T691: complete object storage mirror on Azure (meta task): T239: preserve at least 2 copies of each content object.
Sat, May 25, 5:05 PM · General
zack added a subtask for T239: preserve at least 2 copies of each content object: T691: complete object storage mirror on Azure (meta task).
Sat, May 25, 5:05 PM · General
zack closed T315: swh-scheduler: add command-line interface as Resolved.

we have had this for a while now

Sat, May 25, 5:04 PM · Scheduling utilities
zack updated subscribers of T283: investigate libvirt I/O slowdown.

@douardda: can I punt this to you to either further investigate or just close as Invalid? 3 years later it might no longer be relevant…

Sat, May 25, 5:03 PM · System administration
zack closed T328: svn / subversion loader as Resolved.

closing, we do have an SVN loader now: it has still some issues, but the bulk of the job is done

Sat, May 25, 5:01 PM · SVN Loader
zack closed T328: svn / subversion loader, a subtask of T617: ingest Google Code Subversion repositories, as Resolved.
Sat, May 25, 5:01 PM · Archive coverage, Origin-GoogleCode, SVN Loader
zack added a comment to T1246: pypi loader: Analyze existing errors.

how many are left? can we close this as well as T419 now that the PyPI listers/loaders have been in production for a while?

Sat, May 25, 5:00 PM · Archive coverage, Origin-Pypi
zack updated subscribers of T466: Test - Ingest XXL svn repository.

@anlambert what's the status of ingesting very large SVN repos, now that we have put the loader in production?

Sat, May 25, 4:59 PM · SVN Loader
zack closed T523: Figure out what to do with corrupted copies detected by the archiver, a subtask of T240: content archiver, as Invalid.
Sat, May 25, 4:58 PM · Storage manager