Page MenuHomeSoftware Heritage
Feed Advanced Search

Jun 5 2019

zack claimed T1742: graph dataset: uniform file names.
Jun 5 2019, 10:07 AM · Datasets
zack closed T1742: graph dataset: uniform file names as Resolved.
Jun 5 2019, 10:07 AM · Datasets

Jun 4 2019

zack closed T1783: edge dataset: re-export rev→rev edges in the right order as Resolved.
Jun 4 2019, 10:33 PM · Datasets
zack added a comment to D1534: Add blake2s256 hash to the output of directory_ls.

Do we really want to do this?

Jun 4 2019, 7:57 PM
zack committed rMSLDeeebf8f49bc5: check-in draft slides for sharing & reuse conf (authored by zack).
check-in draft slides for sharing & reuse conf
Jun 4 2019, 5:47 PM
zack committed rMSLDd73e67353b47: swh-as-infra module: fix vertical spacing to avoid overflow (authored by zack).
swh-as-infra module: fix vertical spacing to avoid overflow
Jun 4 2019, 5:47 PM
zack added a comment to T1731: Intrinsic identifiers for origins.

Just a couple of comments:

  • the current proposal is ori instead of org as 3-letter stem
  • your use cases are all valid, but would equally work with a full URL and with a hashed URL
Jun 4 2019, 2:40 PM · Storage manager, Data Model
zack triaged T1783: edge dataset: re-export rev→rev edges in the right order as High priority.
Jun 4 2019, 2:33 PM · Datasets
zack accepted D1460: Graph compression REST API proposal.
Jun 4 2019, 2:07 PM

May 30 2019

zack requested changes to D1460: Graph compression REST API proposal.
May 30 2019, 10:33 AM

May 29 2019

zack added a comment to T1731: Intrinsic identifiers for origins.

Okay then. I'll work on updating the identifier specification.

So, again, what are the remaining issues that inhibits you to just go ahead and use URI hashes as Cassandra origin IDs?

Those I listed above, which were more "philosophical" than technical. I started implementing it last Monday anyway, and it looks good.

May 29 2019, 6:04 PM · Storage manager, Data Model
zack added a comment to T1774: Create a lister for x.org.

I don't like the idea of this lister.

May 29 2019, 11:47 AM · Archive coverage
zack added a comment to T1731: Intrinsic identifiers for origins.

In this case, we'll also need to have an identifier for URL + type, if they want to cite/link to the non-default one.
We could use the "contextual information" mechanism, eg. swh:1:ori:SHA1;type=git

May 29 2019, 11:44 AM · Storage manager, Data Model

May 28 2019

zack added a comment to T1731: Intrinsic identifiers for origins.

This sounds like a good idea.

But it has some weird implications on components that use the concept of "origin head" (web UI and metadata indexers); because they'll use radically different content depending on which loader visited last.
But having two VCSs at the same URL is weird in itself, so 🤷

May 28 2019, 5:09 PM · Storage manager, Data Model
zack added a comment to T1234: Allow simple read-only connections to db from swh nodes.

[ moving here my feedback from D1516 ]

May 28 2019, 4:59 PM · System administration
zack added a comment to D1516: Add dbreplica CNAME.
In D1516#34010, @olasd wrote:

I'm not convinced this is such a good idea; this machine is way more than a "db replica" server (it only has one replica, most its databases are actually primary) and I don't think DNS provides the appropriate granularity level to record this information.

May 28 2019, 3:36 PM · Staff

May 26 2019

zack updated subscribers of T1759: add MSR 2019 dataset paper to the publications page.
May 26 2019, 5:36 PM · Website
zack triaged T1760: add BENEVOL 2018 paper to the publications page as Normal priority.
May 26 2019, 5:35 PM · Website
zack triaged T1759: add MSR 2019 dataset paper to the publications page as Normal priority.
May 26 2019, 5:33 PM · Website

May 25 2019

zack renamed T1411: reach a minimum of 80% SLOC coverage across all components from at least 80% SLOC coverage in all components to reach a minimum of 80% SLOC coverage across all components.
May 25 2019, 5:39 PM · Development environment, Sprint 2018 12
zack changed the status of T1704: increase object storage capacity from Open to Work in Progress.
May 25 2019, 5:39 PM · System administration
zack renamed T1704: increase object storage capacity from Increase storage capacity of the objstorage to increase object storage capacity.
May 25 2019, 5:38 PM · System administration
zack renamed T1686: visual corporate identity (charte graphique) from To write visual corporate identity (charte graphique) to visual corporate identity (charte graphique).
May 25 2019, 5:38 PM · Unknown Object (Project)
zack added a comment to T1378: Ingest npm into the Software Heritage archive (meta task).

I think the only thing missing here is adding the NPM logo to the archive coverage page.

May 25 2019, 5:37 PM · Origin-npm, Archive coverage
zack added a comment to T1629: Deploy the npm ingestion in production.

I think this is now done, right @anlambert ?

May 25 2019, 5:37 PM · Origin-npm
zack added a comment to T1411: reach a minimum of 80% SLOC coverage across all components.

only 3% to go in -lister and -core \o/

May 25 2019, 5:36 PM · Development environment, Sprint 2018 12
zack updated the task description for T1411: reach a minimum of 80% SLOC coverage across all components.
May 25 2019, 5:35 PM · Development environment, Sprint 2018 12
zack closed T1418: Loaders as Invalid.

these catch-all meta-tasks that will grow forever are not terribly useful, the individual tasks + their tasks should be enough

May 25 2019, 5:32 PM · Sprint 2018 12
zack renamed T1475: Test more edge cases of metadata indexer mappings from Add more tests for edge cases of indexer mappings. to Add more tests for edge cases of indexer mappings.
May 25 2019, 5:31 PM · Easy hack, Indexer
zack added a project to T1527: Have comments on all columns of all databases: Documentation.
May 25 2019, 5:30 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer
zack renamed T1581: Deposit: improvements from Deposit: improvments to Deposit: improvements.
May 25 2019, 5:30 PM · SWORD deposit
zack closed T1581: Deposit: improvements as Resolved.
May 25 2019, 5:29 PM · SWORD deposit
zack added a comment to T1581: Deposit: improvements.

closing this catch-all meta task, the individual sub tasks are clear enough

May 25 2019, 5:29 PM · SWORD deposit
zack closed T1658: Gather CLI tools spread in SWH packages, a subtask of T1671: Implement a uniform and consistent cli to be used by all SWH packages, as Resolved.
May 25 2019, 5:28 PM · SWH command line interface
zack closed T1658: Gather CLI tools spread in SWH packages as Resolved.

looks like this is done now, as you're deep in the implementation already! closing

May 25 2019, 5:28 PM · SWH command line interface
zack closed T1329: Move rate limit processing to the Varnish layer ? as Wontfix.

We've discussed this back then, and decided in the end to leave it at the Django layer. Closing.

May 25 2019, 5:27 PM · Web app
zack added a comment to T1338: Change BBUs on orsay.

has this been completed since?

May 25 2019, 5:26 PM · System administration
zack added a comment to T1234: Allow simple read-only connections to db from swh nodes.

can we haz this, please? :)

May 25 2019, 5:25 PM · System administration
zack closed T1180: add munin monitoring of snapshot objects count as Resolved.

snapshot count is now there, closing

May 25 2019, 5:25 PM · Metrics/monitoring, System administration
zack added a comment to T1090: SSH timeouts on new Azure VMs.

is this still the case?

May 25 2019, 5:23 PM · System administration
zack renamed T1002: ingest Hackage, the Haskell package repository (meta task) from ingest Hackage (Haskell package repository) into the Software Heritage archive (meta task) to ingest Hackage, the Haskell package repository (meta task).
May 25 2019, 5:22 PM · Hackage loader, Hackage lister, Archive coverage
zack added a parent task for T881: PostgreSQL backups based on pg_dump: T880: Add softwareheritage-indexer backup.
May 25 2019, 5:21 PM · System administration
zack added a subtask for T880: Add softwareheritage-indexer backup: T881: PostgreSQL backups based on pg_dump.
May 25 2019, 5:21 PM · System administration
zack removed a subtask for T881: PostgreSQL backups based on pg_dump: T880: Add softwareheritage-indexer backup.
May 25 2019, 5:21 PM · System administration
zack removed a parent task for T880: Add softwareheritage-indexer backup: T881: PostgreSQL backups based on pg_dump.
May 25 2019, 5:21 PM · System administration
zack closed T752: logrotate log spam: "error: error creating output file: FILE File exists" as Resolved.

this hasn't happened for a long while

May 25 2019, 5:18 PM · System administration
zack added a comment to T880: Add softwareheritage-indexer backup.

Just checking in on this, are we are discussing moving DBs around. Do we still don't have a backup for the indexer DB?
If so, priority of this one should probably be raised.

May 25 2019, 5:17 PM · System administration
zack closed T917: Git loader: update README for YAML-based syntax, a subtask of T1388: Document the configuration system of each component, as Resolved.
May 25 2019, 5:15 PM · Documentation
zack closed T917: Git loader: update README for YAML-based syntax as Resolved.

This is done, I've forked off the part about consistently documenting configuration options to T1758.

May 25 2019, 5:15 PM · Git loader, Documentation
zack triaged T1758: consistently document the configuration option of each module as Normal priority.
May 25 2019, 5:14 PM · Easy hack, Documentation
zack updated the task description for T917: Git loader: update README for YAML-based syntax.
May 25 2019, 5:13 PM · Git loader, Documentation
zack added a project to T1362: Upgrade the Prometheus setup to Thanos : System administration.
May 25 2019, 5:10 PM · System administration, Metrics/monitoring
zack renamed T1377: in-memory storage: compute all counters from in-memory storage: compute all counters. to in-memory storage: compute all counters.
May 25 2019, 5:10 PM · Easy hack, Storage manager
zack closed T41: add actual content to swh-storage-testdata as Invalid.

swh-storage-testdata is gone, closing

May 25 2019, 5:08 PM · Storage manager
zack closed Unknown Object (Maniphest Task), a subtask of T691: complete object storage mirror on Azure (meta task), as Resolved.
May 25 2019, 5:06 PM · General
zack added a parent task for T691: complete object storage mirror on Azure (meta task): T239: preserve at least 2 copies of each content object.
May 25 2019, 5:05 PM · General
zack added a subtask for T239: preserve at least 2 copies of each content object: T691: complete object storage mirror on Azure (meta task).
May 25 2019, 5:05 PM · General
zack closed T315: swh-scheduler: add command-line interface as Resolved.

we have had this for a while now

May 25 2019, 5:04 PM · Scheduling utilities
zack updated subscribers of T283: investigate libvirt I/O slowdown.

@douardda: can I punt this to you to either further investigate or just close as Invalid? 3 years later it might no longer be relevant…

May 25 2019, 5:03 PM · System administration
zack closed T328: svn / subversion loader as Resolved.

closing, we do have an SVN loader now: it has still some issues, but the bulk of the job is done

May 25 2019, 5:01 PM · SVN Loader
zack closed T328: svn / subversion loader, a subtask of T617: ingest Google Code Subversion repositories, as Resolved.
May 25 2019, 5:01 PM · Archive coverage, Origin-GoogleCode, SVN Loader
zack added a comment to T1246: pypi loader: Analyze existing errors.

how many are left? can we close this as well as T419 now that the PyPI listers/loaders have been in production for a while?

May 25 2019, 5:00 PM · Archive coverage, Origin-Pypi
zack updated subscribers of T466: Test - Ingest XXL svn repository.

@anlambert what's the status of ingesting very large SVN repos, now that we have put the loader in production?

May 25 2019, 4:59 PM · SVN Loader
zack closed T523: Figure out what to do with corrupted copies detected by the archiver, a subtask of T240: content archiver, as Invalid.
May 25 2019, 4:58 PM · Storage manager
zack closed T523: Figure out what to do with corrupted copies detected by the archiver as Invalid.

the archiver is gone, closing

May 25 2019, 4:58 PM · Storage manager
zack renamed T561: ingest bitbucket (meta task) from ingest bitbucket repositories (meta task) to ingest bitbucket (meta task).
May 25 2019, 4:58 PM · Archive coverage, Origin-Bitbucket
zack closed T678: live migration health check as Resolved.

fixed long ago, AFAIK

May 25 2019, 4:56 PM · System administration, Restricted Project
zack added a comment to T691: complete object storage mirror on Azure (meta task).

@olasd recently made a lot of progress on this one.

May 25 2019, 4:56 PM · General
zack closed T780: Improve archiver behavior on big objects as Invalid.

the archiver is gone, closing

May 25 2019, 4:55 PM · Software Heritage Archiver
zack renamed T192: analyze 4 loading failures for GNU tarballs and reimport them from Analyze 4 unexpected gnu tarball failures (+ reimport the archives) to analyze 4 loading failures for GNU tarballs and reimport them.
May 25 2019, 4:52 PM · Tarball loader
zack changed the visibility for T192: analyze 4 loading failures for GNU tarballs and reimport them.
May 25 2019, 4:52 PM · Tarball loader
zack closed T182: investigate gnu.org tarball ingestion failures as Resolved.

oh, the info is in T192 already
closing

May 25 2019, 4:52 PM · Origin-GNU, Directory loader, Tarball loader
zack added a comment to T182: investigate gnu.org tarball ingestion failures.

is this still going? can it be closed as obsolete, maybe just noting down here the 4 failed tarballs? (we're going to do a full swipe soon anyway with the new listers/loaders)

May 25 2019, 4:51 PM · Origin-GNU, Directory loader, Tarball loader
zack added a project to T1622: Merge (or so) generic phabricator related stuff in a dedicated package between snippets/swh-team and pyarcanist: Development environment.
May 25 2019, 4:50 PM · Development environment
zack added a comment to T1510: Have a look at openAPI and decide whether we want to follow these specs.

what's the scope of this? the Web API? all our APIs?
either way, please tag it appropriately

May 25 2019, 4:49 PM · Web app
zack closed T1353: Allow adjusting API rate limits for some users as Resolved.

this is now done, at least based on IP addresses, we'll need (if it doesn't exist yet) a dedicated task for how to do it differently, e.g., using API keys

May 25 2019, 4:48 PM · Web app
zack added a comment to T1757: Vault: Migrate data from old vault to new one.

my take: don't bother (see: T1716#32312)

May 25 2019, 11:17 AM · Vault
zack added a comment to T1716: Vault: Migrate vault infrastructure to azure.

Webapp/cookers migrated to use the azure vault instance.

May 25 2019, 11:12 AM · Vault
zack added a comment to T1667: Fix confusing duplication of "Archive" links on the main website.

the more I look into this, the more I get convinced that what we should actually remove is the https://www.softwareheritage.org/archive/ page, its content should be just integrated/moved into the homepage of archive.s.o.

May 25 2019, 11:10 AM · Website, Unknown Object (Project)

May 24 2019

zack requested changes to D1512: Exempt DINSIC from swh-web rate limiting.
May 24 2019, 12:26 PM
zack triaged T1746: Web API: whitelist DINSIC IP address as Normal priority.
May 24 2019, 9:07 AM · Web app

May 23 2019

zack renamed T1745: DB inventory and documentation from Easier db inventory and documentation to DB inventory and documentation.
May 23 2019, 5:55 PM · Documentation, System administration
zack added a project to T1741: graph dataset: update to use persistent identifiers everywhere: Datasets.
May 23 2019, 2:37 PM · Datasets
zack added a project to T1742: graph dataset: uniform file names: Datasets.
May 23 2019, 2:37 PM · Datasets
zack added a comment to T1743: create a nice landing web page for exported dataset.

A nice related work here are the LAW datasets.

May 23 2019, 2:37 PM · Datasets
zack triaged T1743: create a nice landing web page for exported dataset as Low priority.
May 23 2019, 2:36 PM · Datasets
zack triaged T1742: graph dataset: uniform file names as Low priority.
May 23 2019, 2:35 PM · Datasets
zack triaged T1741: graph dataset: update to use persistent identifiers everywhere as Normal priority.
May 23 2019, 2:32 PM · Datasets
zack created Datasets.
May 23 2019, 2:29 PM

May 22 2019

zack added a comment to T1731: Intrinsic identifiers for origins.

Tangential, but impactful on this discussion, we have had in the past a discussion about removing origin types from our notion of origin (there might be a task about it, but I couldn't find it right now).

May 22 2019, 12:01 PM · Storage manager, Data Model
zack added a comment to D1460: Graph compression REST API proposal.
  • if edges are not specified, we should follow *all* edges during the visit

Yes, this is what I meant here: "Where by default we can explore the graph following all types of edges"

May 22 2019, 9:50 AM
zack requested changes to D1460: Graph compression REST API proposal.
May 22 2019, 9:15 AM
zack added a comment to D1460: Graph compression REST API proposal.

I think that the src_type/dst_type in both the URL and extra_edges is a bit redundant. We could refactor the visit function into the following endpoint:

GET /graph/visit/swh_id/[?allowed_edges=["src_type/dst_type",...]][?direction={forward,backward}]

Where by default we can explore the graph following all types of edges, and restrict it if necessary.

May 22 2019, 9:14 AM

May 20 2019

zack added a comment to T833: When listing an origin, add origin level metadata to RMD storage.
  • sending a request for each repository would need ~2 to 3 years for a full pass over github. That's with our current infrastructure, so it's not a hard limit.
May 20 2019, 3:45 PM · Extrinsic metadata, Restricted Project, GitHub lister

May 19 2019

zack added a comment to D1460: Graph compression REST API proposal.

Here is a revisited version, I also added a starting point parameter for the visit:

GET /graph/visit/src_type/dst_type/src_hash/[?direction={forward,backward}][?extra_edges=["src_type/dst_type",...]]
May 19 2019, 9:17 AM

May 16 2019

zack added a comment to T1719: webapp: deposit admin view: Expose the swh-anchor-id as well.

It'd be great to also expose the link with the revision.

May 16 2019, 1:31 PM · SWORD deposit, Web app
zack added a comment to D1460: Graph compression REST API proposal.
GET /graph/visit/src_type/dst_type/[?direction={forward,backward}][?extra_edge="src_type/dst_type"]*
May 16 2019, 8:51 AM

May 15 2019

zack added a comment to T1709: implement an R-cran lister.

@eddelbuettel yeah, if there isn't a standard way to go all the way back in time, it's OK to currently only ingest what's currently returned as available. In the medium/long term it will converge to having archived everything (w.r.t. the considered time frame) anyway. And we can always retrofit later on stuff that is archived elsewhere. But I wouldn't want to make this a blocker to start archiving what's (easily) listable now.

May 15 2019, 10:17 PM · GSoC 2019, Archive coverage
zack renamed T1713: Vault 'Download' suboptimal UI/UX: no feedback that the bundle is being transferred from 'Download' behavior in vault is kind of broken to Vault 'Download' suboptimal UI/UX: no feedback that the bundle is being transferred.
May 15 2019, 11:22 AM · Web app

May 14 2019

zack added a comment to T1689: enable landing patches via the web UI for all repos.
In T1689#31540, @olasd wrote:

IMy suggestion was having *someone* trigger the merge with a comment on the diff, once the tests pass.

May 14 2019, 5:01 PM · Development environment, Phabricator