Page MenuHomeSoftware Heritage

seirl (Antoine Pietri)
User

User Details

User Since
Feb 2 2017, 11:38 AM (208 w, 5 h)

Recent Activity

Fri, Jan 8

seirl updated the diff for D4006: WIP: add permissions on edge labels.

Rebase on master, include webgraph files

Fri, Jan 8, 4:38 PM
seirl commandeered D4006: WIP: add permissions on edge labels.
Fri, Jan 8, 4:38 PM
seirl added inline comments to D4821: Add LLP compression to the WebGraph pipeline.
Fri, Jan 8, 4:19 PM
seirl planned changes to D4821: Add LLP compression to the WebGraph pipeline.

I'm realizing that this is missing the "simplified" step and needs more changes.

Fri, Jan 8, 4:17 PM
seirl closed T2595: Add a default configuration based on graph size (eg: batch_size) as Resolved by committing rDGRPH5a987aae6e93: config: sane default for batch_size using a heuristic on ram size.
Fri, Jan 8, 3:27 PM · Graph service
seirl closed D4820: config: sane default for batch_size using a heuristic on ram size.
Fri, Jan 8, 3:27 PM
seirl committed rDGRPH5a987aae6e93: config: sane default for batch_size using a heuristic on ram size (authored by seirl).
config: sane default for batch_size using a heuristic on ram size
Fri, Jan 8, 3:27 PM
seirl updated the diff for D4820: config: sane default for batch_size using a heuristic on ram size.

rebase

Fri, Jan 8, 3:24 PM
seirl updated the diff for D4820: config: sane default for batch_size using a heuristic on ram size.

Add task name to commit message

Fri, Jan 8, 3:24 PM
seirl committed rDGRPH85da2e78d681: cli: compression: fix weird bug when using ranges of steps (authored by seirl).
cli: compression: fix weird bug when using ranges of steps
Fri, Jan 8, 3:22 PM
seirl committed rDGRPH317205722b65: Compression: set custom temporary directory at the java level (authored by seirl).
Compression: set custom temporary directory at the java level
Fri, Jan 8, 3:22 PM
seirl committed rDGRPHa4b6570e16ec: Compression: read only src/dst from the labelled edge file (authored by seirl).
Compression: read only src/dst from the labelled edge file
Fri, Jan 8, 3:22 PM

Thu, Jan 7

seirl requested review of D4821: Add LLP compression to the WebGraph pipeline.
Thu, Jan 7, 5:46 PM
seirl requested review of D4820: config: sane default for batch_size using a heuristic on ram size.
Thu, Jan 7, 5:45 PM
seirl committed rDGRPH07a8f25eae5e: java: bump unimi dependencies (authored by seirl).
java: bump unimi dependencies
Thu, Jan 7, 5:41 PM

Wed, Jan 6

seirl accepted D4810: FUSE: tests: remove temporary sleep(2) hack.
Wed, Jan 6, 4:32 PM

Tue, Jan 5

seirl accepted D4805: FUSE: cache: put by-date/ entries in direntry cache.
Tue, Jan 5, 6:01 PM

Dec 17 2020

seirl retitled D4762: Add ORC exporter from Add ORC exporterThis adds a new exporter in columnar format (Apache ORC) using the PyORClibrary. The output can be used on various clouds like AWS S3. to Add ORC exporter.
Dec 17 2020, 7:40 PM
seirl retitled D4762: Add ORC exporter from Add ORC exporter This adds a new exporter in columnar format (Apache ORC) using the PyORC library. The output can be used on various clouds like AWS S3. to Add ORC exporterThis adds a new exporter in columnar format (Apache ORC) using the PyORClibrary. The output can be used on various clouds like AWS S3..
Dec 17 2020, 7:40 PM
seirl created D4762: Add ORC exporter.
Dec 17 2020, 7:40 PM
seirl committed rDDATASETe439aa686f22: Edge exporter: use common remove_pull_requests() function (authored by seirl).
Edge exporter: use common remove_pull_requests() function
Dec 17 2020, 7:37 PM

Dec 16 2020

seirl committed rDDATASETcb71cea14def: journalprocessor: be resilient to exporter errors (authored by seirl).
journalprocessor: be resilient to exporter errors
Dec 16 2020, 5:13 PM
seirl committed rDDATASET6577f653f3c6: Export CLI: add a way to exclude specific object types (authored by seirl).
Export CLI: add a way to exclude specific object types
Dec 16 2020, 5:07 PM
seirl committed rDDATASETf3b156598000: journalprocessor: fix hashing of origin_visit_status objects (authored by seirl).
journalprocessor: fix hashing of origin_visit_status objects
Dec 16 2020, 4:41 PM
seirl closed D4750: journalprocessor: fix hashing of origin_visit_status objects.
Dec 16 2020, 4:41 PM
seirl updated the diff for D4750: journalprocessor: fix hashing of origin_visit_status objects.

Normalize .timestamp()

Dec 16 2020, 4:40 PM

Dec 15 2020

seirl added a reviewer for D4750: journalprocessor: fix hashing of origin_visit_status objects: Reviewers.
Dec 15 2020, 10:03 PM
seirl created D4750: journalprocessor: fix hashing of origin_visit_status objects.
Dec 15 2020, 10:03 PM
seirl closed D4718: Rewrite of the export pipeline using Exporters.

Landed, but phabricator doesn't seem to see it.

Dec 15 2020, 6:49 PM
seirl updated the diff for D4718: Rewrite of the export pipeline using Exporters.
  • journalprocessor: remove comment about deserialize_message overload being a 'hack'
Dec 15 2020, 6:48 PM
seirl updated the diff for D4718: Rewrite of the export pipeline using Exporters.
  • journalprocessor: also partition sqlite files by first byte
  • SQLite on-disk set: disable journalling and synchronous mode
  • tests: fix test_export_origin
Dec 15 2020, 6:46 PM

Dec 14 2020

seirl committed rMSLD4ffe1a8d0780: Add 2020-12-07 coregraphie (authored by seirl).
Add 2020-12-07 coregraphie
Dec 14 2020, 7:58 AM

Dec 11 2020

seirl updated the diff for D4718: Rewrite of the export pipeline using Exporters.
  • Exporter documentation fixes
  • Journal processor: fetch offsets in parallel
Dec 11 2020, 6:07 PM
seirl updated the diff for D4718: Rewrite of the export pipeline using Exporters.

Fix various coding errors and minor improvements

Dec 11 2020, 5:39 PM
seirl committed rDDATASETf1952316a1ea: Graph export: add labels to the export CSV format (authored by seirl).
Graph export: add labels to the export CSV format
Dec 11 2020, 5:38 PM
seirl closed D4707: graph export: handle labels.
Dec 11 2020, 5:38 PM
seirl updated the diff for D4707: graph export: handle labels.

Better commit message:

Dec 11 2020, 5:38 PM

Dec 10 2020

seirl added a reviewer for D4718: Rewrite of the export pipeline using Exporters: Reviewers.
Dec 10 2020, 9:24 PM
seirl created D4718: Rewrite of the export pipeline using Exporters.
Dec 10 2020, 7:38 PM

Dec 9 2020

seirl added a reviewer for D4707: graph export: handle labels: Reviewers.
Dec 9 2020, 7:05 PM
seirl created D4707: graph export: handle labels.
Dec 9 2020, 7:05 PM
seirl committed rDDATASETb21d4a5ca327: graph exporter: schema upgrade for origin_visit_status (authored by seirl).
graph exporter: schema upgrade for origin_visit_status
Dec 9 2020, 7:04 PM
seirl closed D4691: graph exporter: schema upgrade for origin_visit_status.
Dec 9 2020, 7:04 PM

Dec 8 2020

seirl updated the diff for D4691: graph exporter: schema upgrade for origin_visit_status.

Subscribe to the correct objects

Dec 8 2020, 6:25 PM
seirl updated the diff for D4691: graph exporter: schema upgrade for origin_visit_status.

Fix variable name

Dec 8 2020, 5:28 PM
seirl added a reviewer for D4691: graph exporter: schema upgrade for origin_visit_status: Reviewers.
Dec 8 2020, 5:23 PM
seirl created D4691: graph exporter: schema upgrade for origin_visit_status.
Dec 8 2020, 5:21 PM
seirl accepted D4689: FUSE: fs: lookup: add optional regexp name validation.
Dec 8 2020, 4:55 PM
seirl created P896 (An Untitled Masterwork).
Dec 8 2020, 4:26 PM
seirl accepted D4682: FUSE: fix directory listing bugs.
Dec 8 2020, 4:09 PM
seirl added a comment to T2863: FUSE: lookup: add optional regex pre-condition.

My API idea was to simply have something like ENTRIES_REGEXP = r'^.*:.*$' as a class attribute of each type of directory, and a validate_entry(self, name: str) method which, by default, just checks that it matches the regexp.

Dec 8 2020, 11:59 AM · Software Heritage filesystem

Dec 3 2020

seirl added a comment to T2771: FUSE: rethink the visibility of files under archive/ and meta/, and possibly add a new cache/ entrypoint.

We also need to discuss what exactly we put in cache/. I thought about symlinks to archive/ and meta/, what do you think? Removing the symlinks also means removing the data from the cache.

Dec 3 2020, 1:44 PM · Software Heritage filesystem

Dec 2 2020

seirl accepted D4632: FUSE: tests: various code cleanup.
Dec 2 2020, 3:14 PM
seirl added inline comments to D4632: FUSE: tests: various code cleanup.
Dec 2 2020, 1:44 PM

Dec 1 2020

seirl accepted D4631: fs: snapshot: nest branch names as directories instead of URL-escaping.
Dec 1 2020, 5:35 PM

Nov 27 2020

seirl accepted D4569: FUSE: cache: add 'date' column in metadata_cache for history/by-date.
Nov 27 2020, 2:16 PM

Nov 25 2020

seirl accepted D4583: fuse: add support for origin artifacts.
Nov 25 2020, 1:59 PM

Nov 20 2020

seirl triaged T2801: Wrong <title> on snapshot pages as Normal priority.
Nov 20 2020, 9:01 PM · Web app, Easy hack

Nov 19 2020

seirl added inline comments to D4509: fs: history: clean sharded dir implementation.
Nov 19 2020, 2:39 PM

Nov 18 2020

seirl accepted D4509: fs: history: clean sharded dir implementation.
Nov 18 2020, 5:57 PM
seirl accepted D4489: fs: history: add by-date/ sharded directory.
Nov 18 2020, 12:49 PM

Nov 16 2020

seirl accepted D4476: fs: history: add by-page/ sharded directory.
Nov 16 2020, 2:44 PM
seirl accepted D4476: fs: history: add by-page/ sharded directory.
Nov 16 2020, 2:27 PM
seirl accepted D4478: fuse: use logging.exception() instead of .debug().
Nov 16 2020, 1:27 PM

Nov 13 2020

seirl accepted D4416: fs: history: add by-hash/ sharded directory.
Nov 13 2020, 4:03 PM

Nov 12 2020

seirl added a comment to D4416: fs: history: add by-hash/ sharded directory.

I think I understand what your fill_direntry_cache function is trying to do: you want to avoid fetching the history multiple times by doing the request only once and writing the direntry cache of all the children recursively?
Would it be maybe better to instead have a small LRU cache for the API queries, and keep the direntry code simple and fully lazy?

Nov 12 2020, 9:17 PM

Nov 5 2020

seirl added inline comments to D4416: fs: history: add by-hash/ sharded directory.
Nov 5 2020, 3:01 PM

Nov 4 2020

seirl accepted D4345: fuse: add cache on directories entries.
Nov 4 2020, 5:00 PM
seirl added inline comments to D4345: fuse: add cache on directories entries.
Nov 4 2020, 2:42 PM
seirl requested changes to D4345: fuse: add cache on directories entries.

Looks good apart from two small things.

Nov 4 2020, 2:41 PM

Nov 3 2020

seirl requested changes to D4345: fuse: add cache on directories entries.

One thing I don't really like here is that FuseEntries cannot easily list their own entries easily using the cache when available. I would much rather have the cache logic moved inside FuseEntry like what we discussed.

Nov 3 2020, 7:47 PM

Oct 22 2020

seirl accepted D4309: Add flat commit view in a history/ virtual dir.
Oct 22 2020, 5:32 PM

Oct 21 2020

seirl accepted D4316: cache: add missing aiosqlite commit call.
Oct 21 2020, 12:50 PM

Oct 16 2020

seirl accepted D4289: cli: fix daemon working directory.
Oct 16 2020, 3:28 PM
seirl created P825 (An Untitled Masterwork).
Oct 16 2020, 2:37 PM

Oct 14 2020

seirl accepted D4254: fs: add FuseEntry sub-classes for file, dir, symlink.
Oct 14 2020, 2:57 PM

Oct 13 2020

seirl accepted D4246: fuse: add support for release artifacts.
Oct 13 2020, 4:34 PM
seirl requested changes to D4246: fuse: add support for release artifacts.
Oct 13 2020, 3:31 PM
seirl added a comment to T2695: Cache directory entries to make readdir/lookup more efficient.

lookup() should ideally be O(1).

Oct 13 2020, 12:13 PM · Software Heritage filesystem
seirl accepted D4240: fuse: allow mounting artifacts on the fly.
Oct 13 2020, 12:11 PM

Oct 12 2020

seirl accepted D4235: Rework unit testing framework and add more tests.
Oct 12 2020, 6:30 PM
seirl requested changes to D4235: Rework unit testing framework and add more tests.
Oct 12 2020, 4:14 PM

Oct 9 2020

seirl accepted D4200: Add support for revision artifacts.
Oct 9 2020, 2:54 PM
seirl requested changes to D4200: Add support for revision artifacts.
Oct 9 2020, 2:36 PM

Oct 8 2020

seirl accepted D4201: Fix pytest warnings - tests: add missing join() after subprocess.run().
Oct 8 2020, 2:44 PM

Oct 7 2020

seirl accepted D4064: Early FUSE implementation, with support for blob and directory objects.
Oct 7 2020, 3:06 PM
seirl accepted D4028: Add Spotless formatting tool.
Oct 7 2020, 1:11 PM

Oct 6 2020

seirl requested changes to D4064: Early FUSE implementation, with support for blob and directory objects.

This is looking pretty great. I see three more good refactoring possibilities:

Oct 6 2020, 6:34 PM

Oct 5 2020

seirl committed rDGRPH29ae6bf46d22: java: migrate to Junit 5 (authored by seirl).
java: migrate to Junit 5
Oct 5 2020, 11:31 PM
seirl closed D4146: java: migrate to Junit 5.
Oct 5 2020, 11:31 PM
seirl added inline comments to D4146: java: migrate to Junit 5.
Oct 5 2020, 7:44 PM
seirl created D4146: java: migrate to Junit 5.
Oct 5 2020, 7:43 PM
seirl committed rDGRPH8b18ec1cb26b: java: refactor AllowedEdges to remove its Graph attribute (authored by seirl).
java: refactor AllowedEdges to remove its Graph attribute
Oct 5 2020, 6:22 PM
seirl closed D4145: java: refactor AllowedEdges to remove its Graph attribute.
Oct 5 2020, 6:22 PM
seirl updated the diff for D4145: java: refactor AllowedEdges to remove its Graph attribute.

Fix BVGraph/ImmutableGraph implicit naming

Oct 5 2020, 6:22 PM
seirl created D4145: java: refactor AllowedEdges to remove its Graph attribute.
Oct 5 2020, 5:49 PM

Oct 3 2020

seirl added a comment to D4064: Early FUSE implementation, with support for blob and directory objects.

Yes, the correct method name is enter_context

Oct 3 2020, 2:47 PM

Sep 30 2020

seirl requested changes to D4064: Early FUSE implementation, with support for blob and directory objects.
Sep 30 2020, 7:36 PM

Sep 29 2020

seirl accepted D4006: WIP: add permissions on edge labels.
Sep 29 2020, 3:40 PM

Sep 24 2020

seirl committed rDGRPH2be4852de7e4: ConnectedComponents: only compute the size distribution, not the rest (authored by seirl).
ConnectedComponents: only compute the size distribution, not the rest
Sep 24 2020, 6:10 PM