Page MenuHomeSoftware Heritage
Feed Advanced Search

Sep 18 2020

zack retitled D3983: blackify: auto format python code with black from test_vault.py: make black pass again to blackify: auto format python code with black.
Sep 18 2020, 5:13 PM
zack updated the diff for D3983: blackify: auto format python code with black.

blackify all module, rather than just test_vault.py

Sep 18 2020, 5:12 PM
zack accepted D3994: Replace deprecated persistent_identifier method.
Sep 18 2020, 4:40 PM
zack requested changes to D3974: WIP: fuse design doc.
Sep 18 2020, 2:05 PM
zack added a comment to T2589: expose swh-graph API at archive.s.o/api/1/graph/.

You are right, they are not stored in database but there is a storage.origin_get_by_sha1 method.

Sep 18 2020, 12:54 PM · System administration, Web app, Compressed graph service
zack edited P771 Masterwork From Distant Lands.
Sep 18 2020, 10:49 AM
zack created D3983: blackify: auto format python code with black.
Sep 18 2020, 10:44 AM
zack committed rDWCLI4749ea13cfea: CONTRIBUTORS: add haltode (authored by zack).
CONTRIBUTORS: add haltode
Sep 18 2020, 10:35 AM
zack added inline comments to D3982: Replace deprecated PersistentID class with SWHID.
Sep 18 2020, 9:51 AM
zack accepted D3982: Replace deprecated PersistentID class with SWHID.

LGTM

Sep 18 2020, 9:47 AM
zack edited reviewers for D3982: Replace deprecated PersistentID class with SWHID, added: Reviewers; removed: zack.
Sep 18 2020, 9:44 AM

Sep 17 2020

zack added a comment to T2589: expose swh-graph API at archive.s.o/api/1/graph/.
  • We can process swh-graph responses to enrich the data (notably get origin urls from their sha1 or turn swhids into dicts) and returns them in JSON format
Sep 17 2020, 10:25 PM · System administration, Web app, Compressed graph service
zack retitled D3979: Fix blackified strings with spurious concatenation and use f-strings from Fix blackified strings with spurrious concatenation and use f-strings to Fix blackified strings with spurious concatenation and use f-strings.
Sep 17 2020, 7:39 PM
zack requested changes to D3974: WIP: fuse design doc.
Sep 17 2020, 4:31 PM
zack added a comment to T2610: Add isort pre-commit hook and configuration to all repos.

(I was initially surprised by the mixing together of "import" lines with "from ... import" ones, but upon reflection it makes a lot of sense, because one might have to switch between the two forms, and it's silly to have to move the line back and forth between import blocks when that happens.)

Sep 17 2020, 3:38 PM · Development environment
zack updated subscribers of T1789: batch API to check for the presence of content in the archive.
Sep 17 2020, 3:00 PM · Web app
zack closed T1789: batch API to check for the presence of content in the archive as Resolved.

this has been addressed, and in a more general way that works for any SWHID, in D2582 by @DanSeraf

Sep 17 2020, 3:00 PM · Web app
zack merged task T2607: git loader OOM when loading the linux kernel repo into T2373: git loader OOM when loading huge repository.
Sep 17 2020, 9:53 AM · Git loader
zack merged T2607: git loader OOM when loading the linux kernel repo into T2373: git loader OOM when loading huge repository.
Sep 17 2020, 9:53 AM · Git loader
zack renamed T2373: git loader OOM when loading huge repository from staging: git loader: failure to ingest huge repository (e.g. nixpkgs) to git loader OOM when loading huge repository.
Sep 17 2020, 9:53 AM · Git loader
zack changed the status of T1847: fully automate export of the graph dataset, a subtask of T1848: refresh graph dataset export, from Open to Work in Progress.
Sep 17 2020, 9:04 AM · Datasets
zack changed the status of T1847: fully automate export of the graph dataset from Open to Work in Progress.
Sep 17 2020, 9:04 AM · Compressed graph service, Datasets
zack renamed T2607: git loader OOM when loading the linux kernel repo from git loader OOM when loading the linux kernel repo (at least in the docker dev environment) to git loader OOM when loading the linux kernel repo.
Sep 17 2020, 9:03 AM · Git loader
zack raised the priority of T2607: git loader OOM when loading the linux kernel repo from Normal to High.

Very likely the same issue, thanks @ardumont !
Given what @olasd said in that issue (the ingestion logic having remained pretty much the same since ever), and that I can confirm linux.git was loading just fine on my laptop no more than a year ago, the increased memory usage probably comes from elsewhere.
Anyway, it looks like a potentially important issue, so I'm raising priority and also removing the association with the docker env (as you could also reproduce this on staging).

Sep 17 2020, 9:03 AM · Git loader

Sep 16 2020

zack removed a parent task for T1848: refresh graph dataset export: T1868: refresh compressed representation of the archive.
Sep 16 2020, 8:43 PM · Datasets
zack removed a subtask for T1868: refresh compressed representation of the archive: T1848: refresh graph dataset export.
Sep 16 2020, 8:43 PM · Compressed graph service
zack added a comment to T1847: fully automate export of the graph dataset.

I think this is (reasonably) done now, please check and close it.

Sep 16 2020, 8:43 PM · Compressed graph service, Datasets
zack raised the priority of T1848: refresh graph dataset export from Normal to High.
Sep 16 2020, 8:42 PM · Datasets
zack added a comment to T1848: refresh graph dataset export.
Sep 16 2020, 8:42 PM · Datasets
zack assigned T1868: refresh compressed representation of the archive to seirl.

We have now a newer version of the compressed graph (2020-05-20), but it's not yet running on granet (I *think*, and, lacking T2579, I haven't checked).
Please make granet run that version of this task and close this task. (Or just close this task if it's already done.)

Sep 16 2020, 8:41 PM · Compressed graph service
zack updated the task description for T2607: git loader OOM when loading the linux kernel repo.
Sep 16 2020, 8:28 PM · Git loader
zack triaged T2607: git loader OOM when loading the linux kernel repo as Normal priority.
Sep 16 2020, 8:26 PM · Git loader
zack triaged T2605: Web UI: add a way to browse origins, other than search as Low priority.
Sep 16 2020, 5:54 PM · Web app

Sep 15 2020

zack accepted D3945: docs: quickstart: add compression instructions.
Sep 15 2020, 1:22 PM
zack triaged T2601: create a scratch/temporary postgres DB to experiment with flattened directories as Normal priority.
Sep 15 2020, 12:55 PM · System administration
zack triaged T2600: SQL storage: experiment with flattened layouts for directory nodes as Normal priority.
Sep 15 2020, 12:53 PM · Storage manager
zack created P766 current size of directory-related DB entities (swh-replica cluster).
Sep 15 2020, 12:38 PM
zack resigned from D3945: docs: quickstart: add compression instructions.
Sep 15 2020, 11:03 AM
zack requested changes to D3945: docs: quickstart: add compression instructions.

looks great in general!
just a few nits here and there (and possibly a separate issue to file for the sane default part)

Sep 15 2020, 11:02 AM
zack added inline comments to D3945: docs: quickstart: add compression instructions.
Sep 15 2020, 11:01 AM

Sep 14 2020

zack renamed T2589: expose swh-graph API at archive.s.o/api/1/graph/ from expose the compressed graph API at archive.s.o/api/1/graph/ to expose swh-graph API at archive.s.o/api/1/graph/.
Sep 14 2020, 2:37 PM · System administration, Web app, Compressed graph service
zack triaged T2589: expose swh-graph API at archive.s.o/api/1/graph/ as Normal priority.
Sep 14 2020, 2:37 PM · System administration, Web app, Compressed graph service
zack updated subscribers of T2577: Test gitea lister on staging environment.

An email was sent on the swh-devel mailing list to ask for reviews.
The deployment in production will be performed in the middle of week 38 is no problems are raised.

Sep 14 2020, 10:22 AM · Lister
zack assigned T1926: FUSE filesystem to navigate the archive to haltode.
Sep 14 2020, 9:59 AM · Software Heritage filesystem

Sep 10 2020

zack accepted D3876: readme and cli description update.
Sep 10 2020, 5:17 PM
zack requested changes to D3876: readme and cli description update.
Sep 10 2020, 4:34 PM
zack added a reviewer for D3919: cli: speedup the `swh` cli command startup time: DanSeraf.
Sep 10 2020, 4:27 PM
zack added a project to T2575: Investigate if/how we could improve `swh` cli command startup time: Development environment.
Sep 10 2020, 1:18 PM · Development environment

Sep 9 2020

zack triaged T2579: swh-graph: display server and dataset versions in the live server instance as Normal priority.
Sep 9 2020, 11:35 AM · Compressed graph service

Sep 8 2020

zack committed rMSLD4f2c0522b97f: check-in slides for WOOC 2020 talk (authored by zack).
check-in slides for WOOC 2020 talk
Sep 8 2020, 4:07 PM
zack committed rMSLDd6cd4297e11d: biblio module: add Roberto's ICMS 2020 paper (authored by zack).
biblio module: add Roberto's ICMS 2020 paper
Sep 8 2020, 4:07 PM
zack committed rDDOC19963a1ac824: index: improve wording of swh-scanner (authored by zack).
index: improve wording of swh-scanner
Sep 8 2020, 1:59 PM
zack added a comment to T2571: swh-identify: add support for --type revision.

Re supported VCSs: sure, but I'd start with git that is a low-hanging fruit.

Sep 8 2020, 1:18 PM · Easy hack, Data Model
zack assigned T2572: swh-scanner: add support for authentication token to lift rate-limit to tenma.
Sep 8 2020, 10:50 AM · Code scanner
zack triaged T2572: swh-scanner: add support for authentication token to lift rate-limit as Normal priority.
Sep 8 2020, 10:25 AM · Code scanner
zack renamed T2300: swh-scanner: print a nicer error message when rate limit is hit from scanner: print a nicer error message when rate limit is hit to swh-scanner: print a nicer error message when rate limit is hit.
Sep 8 2020, 10:24 AM · Easy hack, Code scanner
zack triaged T2571: swh-identify: add support for --type revision as Normal priority.
Sep 8 2020, 9:12 AM · Easy hack, Data Model
zack triaged T2570: swh-identify: support exclusion patterns (e.g., for .git/) as swh-scanner does as Normal priority.
Sep 8 2020, 9:09 AM · Data Model
zack closed T1687: Add filename as an optional part in persistent identifiers as Resolved.

This seems to have been addressed with the path qualifier in SWHIDs.
Closing.
(Please reopen if I'm missing something.)

Sep 8 2020, 9:06 AM · Data Model
zack updated subscribers of T1136: swh-identify: support recursive checksumming of directories.

As an update: a feature equivalent to this one has been implemented in swh-scanner by @DanSeraf.
I guess it would still be useful to have (as it seems like a natural need) also in swh-identify, but of course the code should not be duplicated.

Sep 8 2020, 9:04 AM · Data Model
zack placed T922: Internal servers send mails from invalid hostnames up for grabs.
Sep 8 2020, 9:00 AM · System administration
zack placed T987: Add an Icinga alert for high queue levels on saatchi up for grabs.
Sep 8 2020, 9:00 AM · System administration
zack placed T1166: Split up pergamon to smaller VMs up for grabs.
Sep 8 2020, 9:00 AM · System administration
zack placed T1178: Make Azure infrastructure independent from Rocquencourt up for grabs.
Sep 8 2020, 9:00 AM · System administration
zack placed T1340: Automate storage BBUs monitoring up for grabs.
Sep 8 2020, 9:00 AM · System administration
zack placed T1526: Install a new VPN endpoint at Rocquencourt up for grabs.
Sep 8 2020, 9:00 AM · System administration
zack placed T1556: Document hardware architecture up for grabs.
Sep 8 2020, 8:59 AM · Documentation
zack placed T1697: Deploy Grafanalib-based dashboards with Puppet up for grabs.
Sep 8 2020, 8:59 AM · Sprint 2018 12, System administration

Sep 7 2020

zack raised the priority of T1926: FUSE filesystem to navigate the archive from Wishlist to Normal.
Sep 7 2020, 10:59 AM · Software Heritage filesystem

Sep 6 2020

zack updated subscribers of T1926: FUSE filesystem to navigate the archive.

Noting down that I had a tentative very preliminary implementation in the feature/fuse branch of swh-graph; see in particular fuse.py there.
It's probably no worth picking up and we should restart from scratch at this point, but might still contain useful material.
(The webclient in there has since become a proper thing, see T2279. So that part is definitely obsolete.)

Sep 6 2020, 4:48 PM · Software Heritage filesystem

Sep 4 2020

zack renamed T2204: Full-text search on source code (prototype) from Full-text search (prototype) to Full-text search on source code (prototype).
Sep 4 2020, 11:38 AM · Roadmap 2021
zack created P753 git representation of symlinks.
Sep 4 2020, 10:56 AM

Sep 3 2020

zack requested changes to D3876: readme and cli description update.
Sep 3 2020, 4:39 PM
zack added a comment to T2559: Modify redirection on https://softwareheritage.org/swhid.

I've no specific advice here, as I wasn't aware that shorturl/SLUG existed, so I don't know where it might be used.
As a general comment, using such a "memorable" shorturl for a blog post doesn't seem like a good idea, as blog posts age pretty quickly.
So I'm in favor of removing it, but don't know how to evaluate the impact of doing so.
(For what is worth: I don't think that using it as a URI in the context you need it in is incompatible with having it being a valid URL pointing to something else, but I agree it would be weird, because some people will load it in their browsers for trying it out.)

Sep 3 2020, 11:23 AM · Website, SWORD deposit, Metadata workflow

Sep 1 2020

zack renamed T2312: Review metadata deposit specs for metadata-only deposit from Review metadata deposit specs of a metadata only deposit to Review metadata deposit specs for metadata-only deposit.
Sep 1 2020, 6:51 PM · Metadata workflow, Roadmap 2020
zack renamed T2540: support the loading of metadata-only deposits in the metadata storage from Implement loading process of only metadadata deposit into the metadata storage to support the loading of metadata-only deposits in the metadata storage.
Sep 1 2020, 6:50 PM · Roadmap 2020, SWORD deposit, Scientific Community Building
zack renamed T2537: Extend new deposit endpoint to support metadata-only deposits from Extend software deposit endpoint to enable only metadata deposits to Extend software deposit endpoint to support metadata-only deposits.
Sep 1 2020, 6:49 PM · Roadmap 2020, SWORD deposit, Scientific Community Building

Aug 25 2020

zack committed rMSLDc14fd7c03c07: 2020 onboarding talk: first complete draft of the dev workflow part (authored by zack).
2020 onboarding talk: first complete draft of the dev workflow part
Aug 25 2020, 10:15 AM
zack committed rMSLD9b6a657dc587: 2020 onboarding talk: some outlining, some TODOs (authored by zack).
2020 onboarding talk: some outlining, some TODOs
Aug 25 2020, 9:20 AM

Aug 24 2020

zack added a project to T2533: CRAN loader: set revision dates: Origin-CRAN.

(tag removed by mistake, adding it back)

Aug 24 2020, 5:49 PM · Origin-CRAN
zack removed a project from T2533: CRAN loader: set revision dates: Origin-CRAN.

Thanks for the pointers @ardumont !

Aug 24 2020, 5:48 PM · Origin-CRAN
zack removed projects from T2523: Archive opensource.samsung.com: Data Model, Core Loader.
Aug 24 2020, 11:38 AM · Lister, Archive coverage
zack triaged T2469: Announce SWH NL to devel and sciences mailing list as Normal priority.
Aug 24 2020, 11:36 AM · Unknown Object (Project)
zack added a project to T2530: Write a simple "quick start" for swh-graph: Documentation.
Aug 24 2020, 11:36 AM · Documentation, Compressed graph service
zack committed rMSLD50b23c5a3193: create skeleton for onboarding slide deck (authored by zack).
create skeleton for onboarding slide deck
Aug 24 2020, 8:54 AM

Aug 20 2020

zack triaged T2526: create a noreply@s.o email address for bulk mailing / notification as Low priority.
Aug 20 2020, 5:14 PM · System administration

Jul 31 2020

zack triaged T2515: grafana: server error / failed to get settings as High priority.
Jul 31 2020, 9:17 PM · System administration

Jul 30 2020

zack renamed T2003: Content replayer may try to copy objects before they are available from an objstorage from Content replayer may try to copy objects before they are available in an objstorage to Content replayer may try to copy objects before they are available from an objstorage.
Jul 30 2020, 8:18 AM · Journal
zack renamed T2512: Make all loaders write their extrinsic metadata to the appropriate storage from Make all loaders write their extrinsic metadata in the appropriate storage to Make all loaders write their extrinsic metadata to the appropriate storage.
Jul 30 2020, 8:17 AM · Metadata workflow, Roadmap 2020

Jul 27 2020

zack triaged T2511: web app: archive search broken for SWHIDs as High priority.
Jul 27 2020, 4:28 PM · Web app

Jul 23 2020

zack added inline comments to D3600: Add visit/edges endpoint.
Jul 23 2020, 1:28 PM
zack resigned from D3599: pid: use PROT_READ/MAP_SHARED for readonly maps.
Jul 23 2020, 12:23 PM

Jul 16 2020

zack added a comment to D3527: Reimplement the GitHub lister using the new pattern class.

nice refactoring work !

Jul 16 2020, 1:53 PM

Jul 15 2020

zack renamed T2492: swh-graph: loading maps fail when available memory is too low: Cannot allocate memory from swh-graph: memory mapping swhid<->node maps fail: Cannot allocate memory to swh-graph: loading maps fail when swhgraphshm is running: Cannot allocate memory.
Jul 15 2020, 5:49 PM · Compressed graph service
zack added a comment to T2492: swh-graph: loading maps fail when available memory is too low: Cannot allocate memory.

I think it's related to the shm trick.

Jul 15 2020, 5:44 PM · Compressed graph service
zack added a project to T2492: swh-graph: loading maps fail when available memory is too low: Cannot allocate memory: Compressed graph service.
Jul 15 2020, 5:22 PM · Compressed graph service
zack triaged T2492: swh-graph: loading maps fail when available memory is too low: Cannot allocate memory as High priority.
Jul 15 2020, 5:19 PM · Compressed graph service

Jul 13 2020

zack planned changes to D3506: rename from "PID" to "SWHID" terminology everywhere.

the java/ part needs porting too

Jul 13 2020, 2:45 PM
zack added a reviewer for D3506: rename from "PID" to "SWHID" terminology everywhere: seirl.
Jul 13 2020, 2:34 PM
zack created D3506: rename from "PID" to "SWHID" terminology everywhere.
Jul 13 2020, 2:34 PM