Page MenuHomeSoftware Heritage
Feed Advanced Search

Nov 14 2022

zack added a subtask for T4685: license dataset: add logic to convert/import dataset into a SQL database: T4683: license dataset: use a consistent file format for CSV-like files.
Nov 14 2022, 4:50 PM · Datasets
zack triaged T4685: license dataset: add logic to convert/import dataset into a SQL database as Low priority.
Nov 14 2022, 4:49 PM · Datasets
zack changed the edit policy for P1529 import the license dataset into sqlite.
Nov 14 2022, 4:47 PM · Datasets
zack created P1529 import the license dataset into sqlite.
Nov 14 2022, 4:47 PM · Datasets
zack added a project to T4683: license dataset: use a consistent file format for CSV-like files: Datasets.
Nov 14 2022, 3:09 PM · Datasets
zack triaged T4683: license dataset: use a consistent file format for CSV-like files as Low priority.
Nov 14 2022, 3:05 PM · Datasets
zack triaged T4682: license dataset: missing java stuff from the replication package as Low priority.
Nov 14 2022, 2:45 PM · Datasets
zack closed D8835: changelog: document recent git loader speed improvements.

merged in abbcf03b7bb2f1425db154dbe6e43e10c647354c

Nov 14 2022, 2:08 PM
zack committed rDDOCabbcf03b7bb2: changelog: document recent git loader speed improvements (authored by zack).
changelog: document recent git loader speed improvements
Nov 14 2022, 2:07 PM
zack requested review of D8835: changelog: document recent git loader speed improvements.
Nov 14 2022, 12:43 PM

Nov 13 2022

zack committed rMSLD1c9d37a84694: biennale talk: last touches (authored by zack).
biennale talk: last touches
Nov 13 2022, 2:47 PM

Nov 12 2022

zack committed rMSLD6f491d0d9a23: check-in slides for talk at Biennale Tecnologia 2022, Turin, Italy (authored by zack).
check-in slides for talk at Biennale Tecnologia 2022, Turin, Italy
Nov 12 2022, 2:45 PM
zack committed rMSLDa239fce04a2f: move reusable ESE slides from 2022-09-28 talk to modules/ (refactoring) (authored by zack).
move reusable ESE slides from 2022-09-28 talk to modules/ (refactoring)
Nov 12 2022, 2:07 PM

Oct 24 2022

zack edited P1505 (An Untitled Masterwork).
Oct 24 2022, 9:49 PM
zack edited P1505 (An Untitled Masterwork).
Oct 24 2022, 4:48 PM
zack created P1505 (An Untitled Masterwork).
Oct 24 2022, 4:46 PM
zack created P1504 (An Untitled Masterwork).
Oct 24 2022, 4:45 PM

Oct 3 2022

zack removed a member for Staff: amadouth6.
Oct 3 2022, 6:49 PM
zack removed a member for Staff: zaboukha.
Oct 3 2022, 6:49 PM
zack removed a member for Staff: aeviso.
Oct 3 2022, 6:49 PM
zack removed a member for Staff: grouss.
Oct 3 2022, 6:49 PM

Sep 28 2022

zack committed rMSLDf5054e1f7343: ESE research: write slides on forks and software provenance (authored by zack).
ESE research: write slides on forks and software provenance
Sep 28 2022, 4:40 PM
zack committed rMSLD5ff40743be0f: ESE research: add slides about DE&I research (authored by zack).
ESE research: add slides about DE&I research
Sep 28 2022, 3:55 PM
zack committed rMSLD91c36bd84fec: ESE research slides: add swh-fuse and swh-graph (authored by zack).
ESE research slides: add swh-fuse and swh-graph
Sep 28 2022, 3:00 PM
zack committed rMSLDcb348e6d2571: check in (old) slides for the Jena workshop on non tangible goods (authored by zack).
check in (old) slides for the Jena workshop on non tangible goods
Sep 28 2022, 1:49 PM
zack committed rMSLD7ffce5ef9e2f: dataset module: add license blob dataset (authored by zack).
dataset module: add license blob dataset
Sep 28 2022, 1:49 PM

Sep 23 2022

zack renamed T4551: document the license dataset on docs.s.o from document the license dataset to document the license dataset on docs.s.o.
Sep 23 2022, 4:38 PM · Documentation, Datasets
zack triaged T4551: document the license dataset on docs.s.o as Normal priority.
Sep 23 2022, 4:38 PM · Documentation, Datasets
zack triaged T4550: dataset: document the AWS S3 bucket for content objects as Normal priority.
Sep 23 2022, 4:27 PM · Documentation, Datasets

Sep 22 2022

zack added a comment to T4549: Write a script to generate qualified SWHID from swh-graph.
Sep 22 2022, 4:10 PM · Compressed graph service
zack renamed T4549: Write a script to generate qualified SWHID from swh-graph from Writa a script to generate qualified SWHID from swh-graph to Write a script to generate qualified SWHID from swh-graph.
Sep 22 2022, 4:06 PM · Compressed graph service

Aug 29 2022

zack reassigned T2579: swh-graph: display server and dataset versions in the live server instance from seirl to vlorentz.
Aug 29 2022, 11:47 AM · Compressed graph service
zack triaged T4469: update license blob dataset to match-ish latest compress graph as Normal priority.
Aug 29 2022, 11:46 AM · Datasets
zack triaged T4468: graph dataset: redirect from annex page to doc as Normal priority.
Aug 29 2022, 11:43 AM · Compressed graph service

Aug 24 2022

zack closed T4322: Set up a swh-users mailing list for user-oriented questions, a subtask of T3730: Add a "user support" mechanism to the archive, as Resolved.
Aug 24 2022, 10:32 AM · meta-task, Roadmap 2022
zack closed T4322: Set up a swh-users mailing list for user-oriented questions as Resolved.

This is now done: swh-users@inria.fr

Aug 24 2022, 10:32 AM

Aug 8 2022

zack added a comment to T4346: Create SourceHut Lister.

This has now been discussed on the sourcehut mailing list and I took part in the conversation.

Aug 8 2022, 11:14 AM · Archive coverage, Lister

Jul 17 2022

zack committed R183:d3bbeae9c9e1: reformat using bibtool (i.e., just run make; no-op change otherwise) (authored by zack).
reformat using bibtool (i.e., just run make; no-op change otherwise)
Jul 17 2022, 8:12 PM
zack committed R183:82033eb184eb: add a bunch of recent papers of mine (authored by zack).
add a bunch of recent papers of mine
Jul 17 2022, 2:52 PM

Jun 23 2022

zack added a member for Speakers: bchauvet.
Jun 23 2022, 4:56 PM

Jun 10 2022

zack removed a member for Reviewers: zack.
Jun 10 2022, 9:51 AM

Jun 8 2022

zack added a comment to T4316: Push of swh-graph to pypi is broken.

For future reference, it looks like we are still "small" players as "big" packages go on PyPI: https://pypi.org/stats/ (e.g., tf-nightly is currently the largest package on PyPI and it weights 427 GiB).
While it is still not nice to ship a big fat JAR in a PyPI package, our extension requests will likely be granted.

Jun 8 2022, 4:03 PM · System administration, Compressed graph service

May 18 2022

zack added a comment to T3560: Polish the swh-search QL.

Hey @vlorentz @zack, I've been using sourcegraph.com for almost a year now and I feel that they have worked a lot on polishing their search query language. I think we can learn from them and adapt our language. Here are a few suggestions:

May 18 2022, 1:32 PM · Archive search
zack resigned from D7839: Documentation overhaul.

Monumental documentation work, thanks!
I think this is generally great, and I've pointed out only some minor issues/suggestions here and there.

May 18 2022, 1:21 PM · Compressed graph service

May 15 2022

zack committed rMSLD8e903979a72a: latex: disable use of \rowcolors, broken with texlive >= 2022 (authored by zack).
latex: disable use of \rowcolors, broken with texlive >= 2022
May 15 2022, 3:56 PM

May 13 2022

zack resigned from D7814: Remove dead code from the Python interface.
May 13 2022, 4:37 PM
zack added a comment to D7814: Remove dead code from the Python interface.
In D7814#203592, @seirl wrote:
In D7814#203336, @zack wrote:

I'm fine with this code cleanup, with one caveat: that we document/ship the systemd startup service (and its meaning, including some intuitions about the trade-offs you mention) somewhere, in replacement of the cachemount command.

It's already in puppet (swh-site/site-modules/profile/templates/swh/deploy/graph/swhgraphshm.service.erb).

May 13 2022, 7:38 AM

May 11 2022

zack requested changes to D7814: Remove dead code from the Python interface.

I'm fine with this code cleanup, with one caveat: that we document/ship the systemd startup service (and its meaning, including some intuitions about the trade-offs you mention) somewhere, in replacement of the cachemount command.

May 11 2022, 10:40 PM

May 6 2022

zack created P1359 (An Untitled Masterwork).
May 6 2022, 4:50 PM

Apr 29 2022

zack renamed T3652: Cannot ingest git repositories with (too) large packfiles from Ingest git loader origins with smaller packfiles to Cannot ingest git repositories with (too) large packfiles.
Apr 29 2022, 4:00 PM · Git loader

Apr 28 2022

zack created P1354 (An Untitled Masterwork).
Apr 28 2022, 2:09 PM

Apr 26 2022

zack accepted D7686: docs: remove PostgreSQL local setup.
Apr 26 2022, 4:36 PM

Apr 22 2022

zack renamed T2833: cpan.loader - archive Perl modules from CPAN from cpan.loader - preserver Perl modules from CPAN to cpan.loader - archive Perl modules from CPAN.
Apr 22 2022, 11:26 AM · CPAN lister, Archive coverage

Apr 5 2022

zack changed the status of T1743: create a nice landing web page for exported dataset, a subtask of T3085: Complete and updated copy of the archive on S3 (objects+graph), from Open to Work in Progress.
Apr 5 2022, 1:39 PM · Roadmap 2022, meta-task, Roadmap 2021, System administration, Object storage
zack changed the status of T1743: create a nice landing web page for exported dataset from Open to Work in Progress.
Apr 5 2022, 1:39 PM · Datasets
zack changed the status of T3329: document ORC format dataset availability from Open to Work in Progress.
Apr 5 2022, 1:38 PM · Datasets

Apr 1 2022

zack accepted D7487: Docs: update dataset list with recent datasets.

Just a nitpick: either add thousand separators to the node/edge counts, or summarized them with M/B suffixes.
Rationale: those numbers are so huge that are hard to read without that.

Apr 1 2022, 5:05 PM
zack committed rDDOC2d89a6533a58: roadmap 2022 intro: fix year and improve wording (authored by zack).
roadmap 2022 intro: fix year and improve wording
Apr 1 2022, 3:36 PM

Mar 30 2022

zack removed a watcher for Developers: zack.
Mar 30 2022, 3:39 PM
zack added a watcher for Team HR and management: zack.
Mar 30 2022, 1:43 PM
zack added a member for Datasets: seirl.
Mar 30 2022, 1:42 PM
zack added a watcher for Datasets: zack.
Mar 30 2022, 1:41 PM
zack added a watcher for Code scanner: zack.
Mar 30 2022, 1:41 PM
zack added a member for Software Heritage filesystem: zack.
Mar 30 2022, 1:41 PM
zack added a watcher for Compressed graph service: zack.
Mar 30 2022, 1:41 PM
zack added a watcher for Software Heritage filesystem: zack.
Mar 30 2022, 1:41 PM
zack renamed Compressed graph service from Graph service to Compressed graph service.
Mar 30 2022, 1:40 PM
zack added a member for Datasets: zack.
Mar 30 2022, 1:39 PM
zack added a member for Code scanner: zack.
Mar 30 2022, 1:39 PM

Mar 10 2022

zack triaged T4029: create vpn and unix account for Andrey to access granet as High priority.
Mar 10 2022, 11:13 AM · System administration

Mar 3 2022

zack added a member for Staff: bchauvet.
Mar 3 2022, 8:40 AM
zack removed a member for Staff: compay2k.
Mar 3 2022, 8:40 AM
zack added a member for Reviewers: bchauvet.
Mar 3 2022, 8:39 AM
zack removed a member for Reviewers: compay2k.
Mar 3 2022, 8:39 AM
zack removed a member for Developers: compay2k.
Mar 3 2022, 8:39 AM
zack added a member for Developers: bchauvet.
Mar 3 2022, 8:39 AM

Mar 2 2022

zack added a member for Team HR and management: bchauvet.
Mar 2 2022, 6:50 PM
zack accepted D7274: onboarding: Mention the creds needed for HTTP Basic auth for the intranet wiki.
Mar 2 2022, 10:45 AM

Mar 1 2022

zack added a member for Team HR and management: compay2k.
Mar 1 2022, 11:04 AM

Feb 25 2022

zack added a member for Staff: compay2k.
Feb 25 2022, 11:16 AM

Feb 22 2022

zack added a subtask for T3952: Make the search query language a first class citizen : T3560: Polish the swh-search QL.
Feb 22 2022, 6:46 PM · meta-task, Roadmap 2022, Archive search
zack added a parent task for T3560: Polish the swh-search QL: T3952: Make the search query language a first class citizen .
Feb 22 2022, 6:46 PM · Archive search

Feb 17 2022

zack added a comment to D7192: Route for fetching Git-encoded objects.

Sorry, that is a bit rambly and not very helpful. @anlambert @zack What do you think?

Feb 17 2022, 1:06 PM

Feb 10 2022

zack triaged T3923: Include submodules recursively when saving git repositories as Normal priority.
Feb 10 2022, 7:44 AM · Git loader, Save Code Now

Jan 27 2022

zack added a comment to T3887: Storing multiple authors in Revisions and Releases.

Then let's just go for it (insert here ref. to upcoming separate task :-)).

Jan 27 2022, 6:00 PM · SWORD deposit, Data Model, BZR loader
zack added a comment to T3887: Storing multiple authors in Revisions and Releases.
In T3887#77949, @olasd wrote:

Now that I've written it out loud, of course, Releases don't have extra_headers so the package loaders can't make use of this workaround/hack for now.

Jan 27 2022, 5:40 PM · SWORD deposit, Data Model, BZR loader
zack committed rMSLD4bf5d5d41819: check in recent presentations (authored by zack).
check in recent presentations
Jan 27 2022, 10:13 AM

Jan 25 2022

zack triaged T3885: Filter rows of size >32MB from dataset export as Normal priority.
Jan 25 2022, 1:32 PM · Datasets

Jan 10 2022

zack committed R183:6b876e2ac76b: add several entries about reproducibility, FOSS geography, and diversity (authored by zack).
add several entries about reproducibility, FOSS geography, and diversity
Jan 10 2022, 7:58 PM

Jan 4 2022

zack closed T3260: publish swh.dataset to pypi as Resolved.
Jan 4 2022, 1:42 PM · Continuous Integration, Datasets
zack changed the status of T3768: Read compression input from ORC instead of the edges file from Open to Work in Progress.
Jan 4 2022, 1:35 PM · Compressed graph service

Jan 3 2022

zack added a comment to T3822: Update the fundraising banner.

@marla.dasilva @anlambert: let's go for "Until Jan 30th" then. (I'll also ping you about this in the chat, just in case.)

Jan 3 2022, 3:43 PM · Unknown Object (Project)
zack triaged T3822: Update the fundraising banner as High priority.

Thanks Marla, I also planned to raise this.

Jan 3 2022, 11:13 AM · Unknown Object (Project)

Dec 16 2021

zack triaged T3811: archive.s.o: change Debian tooltip to include derivatives as Low priority.
Dec 16 2021, 10:40 AM · Web app
zack renamed T2400: Ingest current and historical Ubuntu releases from Ingest current and history Ubuntu releases to Ingest current and historical Ubuntu releases.
Dec 16 2021, 10:36 AM · System administration, Debian loader, Package Loader, Archive coverage

Dec 14 2021

zack raised the priority of T3161: graph service: add anti-DoS limit on the number of edges traversed from Normal to High.
Dec 14 2021, 1:31 PM · Compressed graph service

Dec 6 2021

zack accepted D4821: Add LLP compression to the WebGraph pipeline.

Just to be sure: test_pipeline() from test_cli.py is now run with all new passes as well, and as such it also testes the LLP step(s), correct?
It seems that way to me because test_pipeline() seems to be running all passes, but I'd like this to be double-checked before landing.

Dec 6 2021, 6:08 PM

Dec 4 2021

zack committed rMSLDfc9bffe30c07: check-in slides for tech presentation at #swh5years sponsors meeting (authored by zack).
check-in slides for tech presentation at #swh5years sponsors meeting
Dec 4 2021, 10:22 AM

Dec 1 2021

zack moved T2595: Add a default configuration based on graph size (eg: batch_size) from Backlog to Deployed on the Compressed graph service board.
Dec 1 2021, 5:00 PM · Compressed graph service
zack changed the status of T2113: swh-graph: add support to optionally resolve ori PIDs to origin URLs from Wontfix to Resolved.
Dec 1 2021, 5:00 PM · Compressed graph service