Page MenuHomeSoftware Heritage
Feed Advanced Search

Oct 15 2021

zack closed D6479: bib/install: disable pip self ugprade.
Oct 15 2021, 11:47 AM
zack committed rDENV2af28729729d: bib/install: disable pip self ugprade (authored by zack).
bib/install: disable pip self ugprade
Oct 15 2021, 11:47 AM
zack requested review of D6479: bib/install: disable pip self ugprade.
Oct 15 2021, 10:20 AM
zack updated subscribers of T3656: Survey revisions/releases with partially loaded history.
Oct 15 2021, 9:34 AM · Archive content
zack added a comment to T3656: Survey revisions/releases with partially loaded history.
In T3656#72364, @grouss wrote:

according to the list of nodes provided by seirl there were ~21,000,000 revisions without ancestors according to swh-graph snapshot (2020-12-15)

Oct 15 2021, 9:33 AM · Archive content
zack triaged T3660: Nodes with missing ancestors in SWH DAG / SWH-graph as Low priority.
Oct 15 2021, 8:56 AM · Archive content
zack added a parent task for T3660: Nodes with missing ancestors in SWH DAG / SWH-graph: T3656: Survey revisions/releases with partially loaded history.
Oct 15 2021, 8:56 AM · Archive content
zack added a subtask for T3656: Survey revisions/releases with partially loaded history: T3660: Nodes with missing ancestors in SWH DAG / SWH-graph.
Oct 15 2021, 8:56 AM · Archive content
zack updated subscribers of T3656: Survey revisions/releases with partially loaded history.

In T3660, @grouss has found many more.
Might be for a different reason (the dataset he analyzed is not the live one), but it's worth a comparison.

Oct 15 2021, 8:55 AM · Archive content

Oct 14 2021

zack committed rDDOC5efb7bfae0db: archive changelog: fix markup errors in recent links (authored by zack).
archive changelog: fix markup errors in recent links
Oct 14 2021, 3:02 PM
zack accepted D6474: changelog: Update bitbucket mercurial ingestion status.
Oct 14 2021, 2:51 PM
zack updated the task description for T3639: prepare quote for "granet2", next gen swh-graph compression server.
Oct 14 2021, 2:06 PM · System administration
zack committed rMSLD26cb7cb8692f: check in slides for talk at Telecom Paris, DIG team (authored by zack).
check in slides for talk at Telecom Paris, DIG team
Oct 14 2021, 10:04 AM
zack retitled D6470: Make it explicit that the "main" docs page is actually devel doc from Explicit the main docs page is actually the devel instance to Make it explicit that the "main" docs page is actually devel doc.
Oct 14 2021, 9:49 AM

Oct 13 2021

zack triaged T3651: doc: drop unneeded "Software Heritage" from page titles as Low priority.
Oct 13 2021, 8:14 PM · Easy hack, Documentation
zack renamed T3650: documentation: rename docs.s.o/users/ (plural) to docs.s.o/user/ (singular) from documentation: rename docs.s.o/users/ (plural) to docs.s.o/user/ (singula) to documentation: rename docs.s.o/users/ (plural) to docs.s.o/user/ (singular).
Oct 13 2021, 8:11 PM · Documentation
zack triaged T3650: documentation: rename docs.s.o/users/ (plural) to docs.s.o/user/ (singular) as High priority.
Oct 13 2021, 8:11 PM · Documentation
zack added a comment to T3649: Add access from one docs dimension to another.

+1 on dropping the / -> /devel/ redirect and have at / a landing page allowing to choose between the 3 bodies of documentation.

Oct 13 2021, 8:09 PM · Documentation

Oct 12 2021

zack created P1199 Command-Line Input.
Oct 12 2021, 4:03 PM

Oct 11 2021

zack added a comment to T3621: Create a production read-only objstorage.

Aside from the specific needs of the mirroring stack, the question at hand is whether the read-only object storage should be by default open to the public or not.

Oct 11 2021, 12:16 PM · System administration

Oct 9 2021

zack committed rDGRPHc29ee2787e27: findEarliestRevision: avoid failing in case of unknown SWHIDs (authored by zack).
findEarliestRevision: avoid failing in case of unknown SWHIDs
Oct 9 2021, 3:18 PM

Oct 8 2021

zack triaged T3639: prepare quote for "granet2", next gen swh-graph compression server as High priority.
Oct 8 2021, 4:04 PM · System administration

Oct 7 2021

zack added 1 blocking reviewer(s) for D6401: Filter out pull request related branches: zack.

This should stay pending until we resolve the archiving policy discussion in T3627, so I'm marking it as such.

Oct 7 2021, 10:57 AM
zack added a comment to T3627: Consider dropping pull request references from the git loader ingestion.

Thanks for your feedback @olasd. I see three main arguments raised there: (1) the raciness of archiving those data via other means (= related forks), (2) the completeness of our canvassing of synthetic refs, (3) annotating rather than not archiving "synthetic" refs.

Oct 7 2021, 10:54 AM · Git loader

Oct 6 2021

zack added a reviewer for D6401: Filter out pull request related branches: zack.
Oct 6 2021, 4:45 PM

Oct 4 2021

zack added a comment to T3627: Consider dropping pull request references from the git loader ingestion.

No, the snippet mentioned filters out refs whose name starts with refs/pulls and finishes with /merge
(i realize i made a typo in the main description..., it's now fixed)

Oct 4 2021, 4:37 PM · Git loader
zack added a comment to T3627: Consider dropping pull request references from the git loader ingestion.

According to the snippet referenced by @ardumont, all branch names starting with refs/pull/ should be filtered out.
But in the recent snapshot of torvalds/linux there are a lot of branch names like that.
How come?

Oct 4 2021, 2:01 PM · Git loader

Oct 2 2021

zack raised the priority of T3623: Run swh-graph with gunicorn to support multiple/parallel requests from Normal to High.
Oct 2 2021, 8:06 AM · Compressed graph service, System administration
zack raised the priority of T3624: Update swh-graph from 0.3.0 to 0.5.0 on granet from Normal to High.
Oct 2 2021, 8:06 AM · Compressed graph service, System administration
zack renamed T3623: Run swh-graph with gunicorn to support multiple/parallel requests from Run swh-graph with gunicorn to Run swh-graph with gunicorn to support multiple/parallel requests.
Oct 2 2021, 8:01 AM · Compressed graph service, System administration

Oct 1 2021

zack triaged T3626: graph API: add ?limit parameter to /leaves endpoint as Low priority.
Oct 1 2021, 4:53 PM · Easy hack, Compressed graph service

Sep 30 2021

zack updated the title for P1187 import-ics from Command-Line Input to import-ics.
Sep 30 2021, 5:00 PM
zack created P1187 import-ics.
Sep 30 2021, 5:00 PM
zack created P1186 Command-Line Input.
Sep 30 2021, 4:25 PM

Sep 27 2021

zack created P1177 Command-Line Input.
Sep 27 2021, 4:50 PM

Sep 24 2021

zack accepted D6338: persistent-identifiers.rst: Update references to manifest formats.
Sep 24 2021, 1:02 PM

Sep 23 2021

zack added a project to T3581: List heptapod instance foss.heptapod.net: Archive coverage.
Sep 23 2021, 11:32 AM · Archive coverage, System administration, Origin-GitLab
zack added a project to T3568: Deploy opam lister/loader to production: Archive coverage.
Sep 23 2021, 11:32 AM · System administration, Archive coverage, Opam

Sep 22 2021

zack accepted D6320: docs: Update archive changelog about the foss.heptpod.net instance.

Approved, but please fix the minutia I'm mentioning in the above comment before landing.

Sep 22 2021, 2:20 PM
zack added a comment to T1805: Public API v2.

it's true these do not come "for free" but I still have the impression there is an "Open API way" of handling these and we should stick to them.

Sep 22 2021, 1:03 PM · meta-task, Web app

Sep 18 2021

zack added a project to T3425: Opam loader: Archive coverage.
Sep 18 2021, 8:29 AM · Archive coverage, Opam
zack added a project to T3358: Opam lister: Archive coverage.
Sep 18 2021, 8:29 AM · Archive coverage, Lister
zack added a project to T3424: Opam support: Archive coverage.
Sep 18 2021, 8:28 AM · Archive coverage, Opam

Sep 1 2021

zack added a comment to T3544: Deal with GitHub removing support for git:// URLs.
In T3544#69746, @olasd wrote:

I can see a few alternatives to using git:// over tcp:

  • Give our swh bot accounts SSH keys, and use that to clone from GitHub over ssh.
Sep 1 2021, 10:06 PM · Origin-GitHub, Git loader

Aug 31 2021

zack added a comment to T3542: Decide what metadata we want to / can collect from GitHub.

Here's an opinionated and prioritized list.

Aug 31 2021, 3:49 PM · Origin-GitHub, Extrinsic metadata
zack added a comment to D6004: misc/coverage: Revamp and improve archive coverage widget.

quick comment on the "Miscellaneous" category:

  • it's not a great name, and it really feels they are "less important" than the others even if we say explicitly they aren't (or maybe because we say so :-))
  • and shouldn't the two items in there (nix, guix) go under "regular crawling" anyway? (that would trivially solve the previous point)
Aug 31 2021, 3:39 PM

Aug 24 2021

zack accepted D6114: swh-scanner: retrieve additional information about software artifacts.

LGTM, thanks! But please note that remaining "-a/--add" reference in the new docstring, which should be changed to "-e/..." for consistency. Please fix that before landing this change.

Aug 24 2021, 5:58 PM

Aug 23 2021

zack requested changes to D6114: swh-scanner: retrieve additional information about software artifacts.

Nice! And I also like the refactoring out of client.py.

Aug 23 2021, 11:18 AM

Aug 19 2021

zack updated the task description for T3490: Collect metadata from ClearlyDefined.
Aug 19 2021, 10:13 AM · Extrinsic metadata

Aug 17 2021

zack added a comment to D6004: misc/coverage: Revamp and improve archive coverage widget.

Thanks for this, and for the screenshots, they look gorgeous!

Aug 17 2021, 5:13 PM

Aug 10 2021

zack added a reviewer for D6073: bytes_to_str: Format strings directly, instead of constructing ExtendedSWHID: seirl.
Aug 10 2021, 12:26 PM
zack updated the task description for T3475: leverage Shodan scans to find and ingest the "penumbra" of FOSS.
Aug 10 2021, 12:21 PM · Archive coverage
zack updated the task description for T3475: leverage Shodan scans to find and ingest the "penumbra" of FOSS.
Aug 10 2021, 12:19 PM · Archive coverage
zack triaged T3475: leverage Shodan scans to find and ingest the "penumbra" of FOSS as Low priority.
Aug 10 2021, 12:19 PM · Archive coverage
zack added a reviewer for D6072: StreamingGraphView: Buffer lines before writing: seirl.
Aug 10 2021, 12:12 PM
zack raised the priority of T3457: Some git repositories are failing to be ingested because of MemoryError from Normal to High.
Aug 10 2021, 12:10 PM · Git loader

Aug 9 2021

zack accepted D6065: swh-scanner: add policy greedybfs.

I'm approving this, but please fix the remaining occurrence of 1000 before landing, as per comment above.

Aug 9 2021, 7:04 PM
zack requested changes to D6065: swh-scanner: add policy greedybfs.

looks good!

Aug 9 2021, 1:56 PM
zack added a reviewer for D6065: swh-scanner: add policy greedybfs: zack.
Aug 9 2021, 1:44 PM

Jul 29 2021

zack accepted D6027: swh-scanner: add 'auto' option as default policy.
Jul 29 2021, 10:01 AM

Jul 28 2021

zack requested changes to D6027: swh-scanner: add 'auto' option as default policy.

I'm requesting some minor changes (+ some other changes to be submitted in a separate diff which I've noticed only now, sorry!).

Jul 28 2021, 11:08 AM

Jul 23 2021

zack committed rDDOCf9451bd1038a: changelog: fix sphinx markup error in sourceforge entry (authored by zack).
changelog: fix sphinx markup error in sourceforge entry
Jul 23 2021, 9:59 AM

Jul 22 2021

zack added a comment to D5952: changelog: Reference first completion of sourceforge git/svn origins.

awesome \o/
(diff accepted)

Jul 22 2021, 10:33 AM
zack accepted D5952: changelog: Reference first completion of sourceforge git/svn origins.
Jul 22 2021, 10:33 AM
zack accepted D6005: docs/query-language: Describe search query language syntax.

I'm accepting this diff, but note that I've added a few suggestions for improved language above. Please integrate them before this is final.

Jul 22 2021, 9:29 AM

Jul 21 2021

zack added inline comments to D6005: docs/query-language: Describe search query language syntax.
Jul 21 2021, 11:47 AM
zack requested changes to D6005: docs/query-language: Describe search query language syntax.
Jul 21 2021, 11:42 AM
zack accepted D5996: swh-scanner: new scan policies.

Wonderful, thanks for adding the order tests! LGTM.

Jul 21 2021, 9:59 AM

Jul 19 2021

zack added inline comments to D5996: swh-scanner: new scan policies.
Jul 19 2021, 4:25 PM
zack added inline comments to D6005: docs/query-language: Describe search query language syntax.
Jul 19 2021, 1:47 PM
zack requested changes to D6005: docs/query-language: Describe search query language syntax.

thanks @KShivendu, this is a great start!

Jul 19 2021, 12:26 PM

Jul 17 2021

zack added a comment to D5990: query_language: Setup tree-sitter and grammar.js.

@zack
It is okay with you if I add it in the next diff? This one has become extremely long because of lots of build failures.

Jul 17 2021, 12:13 PM

Jul 16 2021

zack added a comment to D5990: query_language: Setup tree-sitter and grammar.js.
In D5990#154613, @zack wrote:

Can we have some documentation of the query language, included in this diff?
E.g., a file under docs/ which will then be rendered on docs.s.o as user documentation for how to use the query language.

Jul 16 2021, 11:35 AM
zack added a comment to D5990: query_language: Setup tree-sitter and grammar.js.

Can we have some documentation of the query language, included in this diff?
E.g., a file under docs/ which will then be rendered on docs.s.o as user documentation for how to use the query language.

Jul 16 2021, 11:33 AM
zack added a comment to T3127: Compute and display distribution of origins by forge.

Thanks for this update, great work!

Jul 16 2021, 11:29 AM · Metrics/monitoring, Web app, Roadmap 2021, meta-task
zack requested changes to D5996: swh-scanner: new scan policies.

Looks great! I've noted down only some nits.

Jul 16 2021, 11:02 AM

Jul 15 2021

zack renamed T3431: Implement a MongoDB backend for SWH-provenance from Implement a MonoDB backend for SWH-provenance to Implement a MongoDB backend for SWH-provenance .
Jul 15 2021, 10:52 AM · Provenance database

Jul 8 2021

zack accepted D5981: scanner: access MerkleNodeInfo with the correct key.
Jul 8 2021, 5:29 PM
zack changed the status of T2730: scanner: should output the root SWHID as well from Open to Work in Progress.
Jul 8 2021, 2:13 PM · Easy hack, Code scanner
zack changed the status of T2692: Move the output related functions to another (sub)module from Open to Work in Progress.
Jul 8 2021, 2:13 PM · Code scanner
zack moved T3318: scanner should use the known() method of web.client from In progress to Backlog on the Code scanner board.
Jul 8 2021, 2:13 PM · Code scanner
zack added a parent task for T2635: web client: add async API: T3318: scanner should use the known() method of web.client.
Jul 8 2021, 2:11 PM · Web client
zack added a subtask for T3318: scanner should use the known() method of web.client: T2635: web client: add async API.
Jul 8 2021, 2:11 PM · Code scanner
zack accepted D5926: swh.scanner: use model.from_disk instead of scanner.model to store a source code project.

Please note down on the sides two remaining TODOs about all this:

  • adding a test case with a deduplicated source tree, to make sure nodes that are deduplicated at the Merkle DAG level are present multiple times in the output
  • adding a test case for a path that is not decodable in utf-8, to make sure it can be handled propertly
Jul 8 2021, 9:43 AM

Jul 7 2021

zack added inline comments to D5926: swh.scanner: use model.from_disk instead of scanner.model to store a source code project.
Jul 7 2021, 11:55 AM

Jul 6 2021

zack requested changes to D5926: swh.scanner: use model.from_disk instead of scanner.model to store a source code project.

Thanks, both the general structure and implementation look OK.
I'm requesting changes to address two main issues:

Jul 6 2021, 10:12 AM

Jul 5 2021

zack added a parent task for T3349: use swh.model.merkle/from_disk instead of swh.scanner.model: T2730: scanner: should output the root SWHID as well.
Jul 5 2021, 3:21 PM · Code scanner
zack added a subtask for T2730: scanner: should output the root SWHID as well: T3349: use swh.model.merkle/from_disk instead of swh.scanner.model.
Jul 5 2021, 3:21 PM · Easy hack, Code scanner
zack removed a parent task for T2730: scanner: should output the root SWHID as well: T3349: use swh.model.merkle/from_disk instead of swh.scanner.model.
Jul 5 2021, 3:20 PM · Easy hack, Code scanner
zack removed a subtask for T3349: use swh.model.merkle/from_disk instead of swh.scanner.model: T2730: scanner: should output the root SWHID as well.
Jul 5 2021, 3:20 PM · Code scanner
zack changed the status of T3420: scanner: make the various query algorithms user-selectable from Open to Work in Progress.
Jul 5 2021, 3:11 PM · Code scanner
zack assigned T3318: scanner should use the known() method of web.client to DanSeraf.
Jul 5 2021, 3:11 PM · Code scanner
zack changed the status of T3318: scanner should use the known() method of web.client from Open to Work in Progress.
Jul 5 2021, 3:11 PM · Code scanner
zack added a parent task for T3349: use swh.model.merkle/from_disk instead of swh.scanner.model: T3420: scanner: make the various query algorithms user-selectable.
Jul 5 2021, 3:10 PM · Code scanner
zack added a subtask for T3420: scanner: make the various query algorithms user-selectable: T3349: use swh.model.merkle/from_disk instead of swh.scanner.model.
Jul 5 2021, 3:10 PM · Code scanner
zack triaged T3420: scanner: make the various query algorithms user-selectable as Normal priority.
Jul 5 2021, 3:10 PM · Code scanner

Jul 2 2021

zack accepted D5951: model: make deduplication optional when iterating over the merkle tree.
Jul 2 2021, 10:16 AM

Jul 1 2021

zack added inline comments to D5951: model: make deduplication optional when iterating over the merkle tree.
Jul 1 2021, 8:31 PM
zack requested changes to D5951: model: make deduplication optional when iterating over the merkle tree.

only minor changes requested to the docstring on my part

Jul 1 2021, 8:27 PM
zack added a comment to D5952: changelog: Reference first completion of sourceforge git/svn origins.

I don't really have an ETA yet [1]. We are roughly 67% done for git and 84.6% for svn
[2]. For mercurial, it's not started as other blocking points are being worked on.
Bazaar and cvs origins are listed but we don't have any loader on that front yet.

Jul 1 2021, 11:11 AM