Page MenuHomeSoftware Heritage
Feed Advanced Search

Dec 8 2020

seirl accepted D4689: FUSE: fs: lookup: add optional regexp name validation.
Dec 8 2020, 4:55 PM
seirl created P896 (An Untitled Masterwork).
Dec 8 2020, 4:26 PM
seirl accepted D4682: FUSE: fix directory listing bugs.
Dec 8 2020, 4:09 PM
seirl added a comment to T2863: FUSE: lookup: add optional regex pre-condition.

My API idea was to simply have something like ENTRIES_REGEXP = r'^.*:.*$' as a class attribute of each type of directory, and a validate_entry(self, name: str) method which, by default, just checks that it matches the regexp.

Dec 8 2020, 11:59 AM · Software Heritage filesystem

Dec 3 2020

seirl added a comment to T2771: FUSE: rethink the visibility of files under archive/ and meta/, and possibly add a new cache/ entrypoint.

We also need to discuss what exactly we put in cache/. I thought about symlinks to archive/ and meta/, what do you think? Removing the symlinks also means removing the data from the cache.

Dec 3 2020, 1:44 PM · Software Heritage filesystem

Dec 2 2020

seirl accepted D4632: FUSE: tests: various code cleanup.
Dec 2 2020, 3:14 PM
seirl added inline comments to D4632: FUSE: tests: various code cleanup.
Dec 2 2020, 1:44 PM

Dec 1 2020

seirl accepted D4631: fs: snapshot: nest branch names as directories instead of URL-escaping.
Dec 1 2020, 5:35 PM

Nov 27 2020

seirl accepted D4569: FUSE: cache: add 'date' column in metadata_cache for history/by-date.
Nov 27 2020, 2:16 PM

Nov 25 2020

seirl accepted D4583: fuse: add support for origin artifacts.
Nov 25 2020, 1:59 PM

Nov 20 2020

seirl triaged T2801: Wrong <title> on snapshot pages as Normal priority.
Nov 20 2020, 9:01 PM · Web app, Easy hack

Nov 19 2020

seirl added inline comments to D4509: fs: history: clean sharded dir implementation.
Nov 19 2020, 2:39 PM

Nov 18 2020

seirl accepted D4509: fs: history: clean sharded dir implementation.
Nov 18 2020, 5:57 PM
seirl accepted D4489: fs: history: add by-date/ sharded directory.
Nov 18 2020, 12:49 PM

Nov 16 2020

seirl accepted D4476: fs: history: add by-page/ sharded directory.
Nov 16 2020, 2:44 PM
seirl accepted D4476: fs: history: add by-page/ sharded directory.
Nov 16 2020, 2:27 PM
seirl accepted D4478: fuse: use logging.exception() instead of .debug().
Nov 16 2020, 1:27 PM

Nov 13 2020

seirl accepted D4416: fs: history: add by-hash/ sharded directory.
Nov 13 2020, 4:03 PM

Nov 12 2020

seirl added a comment to D4416: fs: history: add by-hash/ sharded directory.

I think I understand what your fill_direntry_cache function is trying to do: you want to avoid fetching the history multiple times by doing the request only once and writing the direntry cache of all the children recursively?
Would it be maybe better to instead have a small LRU cache for the API queries, and keep the direntry code simple and fully lazy?

Nov 12 2020, 9:17 PM

Nov 5 2020

seirl added inline comments to D4416: fs: history: add by-hash/ sharded directory.
Nov 5 2020, 3:01 PM

Nov 4 2020

seirl accepted D4345: fuse: add cache on directories entries.
Nov 4 2020, 5:00 PM
seirl added inline comments to D4345: fuse: add cache on directories entries.
Nov 4 2020, 2:42 PM
seirl requested changes to D4345: fuse: add cache on directories entries.

Looks good apart from two small things.

Nov 4 2020, 2:41 PM

Nov 3 2020

seirl requested changes to D4345: fuse: add cache on directories entries.

One thing I don't really like here is that FuseEntries cannot easily list their own entries easily using the cache when available. I would much rather have the cache logic moved inside FuseEntry like what we discussed.

Nov 3 2020, 7:47 PM

Oct 22 2020

seirl accepted D4309: Add flat commit view in a history/ virtual dir.
Oct 22 2020, 5:32 PM

Oct 21 2020

seirl accepted D4316: cache: add missing aiosqlite commit call.
Oct 21 2020, 12:50 PM

Oct 16 2020

seirl accepted D4289: cli: fix daemon working directory.
Oct 16 2020, 3:28 PM
seirl created P825 (An Untitled Masterwork).
Oct 16 2020, 2:37 PM

Oct 14 2020

seirl accepted D4254: fs: add FuseEntry sub-classes for file, dir, symlink.
Oct 14 2020, 2:57 PM

Oct 13 2020

seirl accepted D4246: fuse: add support for release artifacts.
Oct 13 2020, 4:34 PM
seirl requested changes to D4246: fuse: add support for release artifacts.
Oct 13 2020, 3:31 PM
seirl added a comment to T2695: Cache directory entries to make readdir/lookup more efficient.

lookup() should ideally be O(1).

Oct 13 2020, 12:13 PM · Software Heritage filesystem
seirl accepted D4240: fuse: allow mounting artifacts on the fly.
Oct 13 2020, 12:11 PM

Oct 12 2020

seirl accepted D4235: Rework unit testing framework and add more tests.
Oct 12 2020, 6:30 PM
seirl requested changes to D4235: Rework unit testing framework and add more tests.
Oct 12 2020, 4:14 PM

Oct 9 2020

seirl accepted D4200: Add support for revision artifacts.
Oct 9 2020, 2:54 PM
seirl requested changes to D4200: Add support for revision artifacts.
Oct 9 2020, 2:36 PM

Oct 8 2020

seirl accepted D4201: Fix pytest warnings - tests: add missing join() after subprocess.run().
Oct 8 2020, 2:44 PM

Oct 7 2020

seirl accepted D4064: Early FUSE implementation, with support for blob and directory objects.
Oct 7 2020, 3:06 PM
seirl accepted D4028: Add Spotless formatting tool.
Oct 7 2020, 1:11 PM

Oct 6 2020

seirl requested changes to D4064: Early FUSE implementation, with support for blob and directory objects.

This is looking pretty great. I see three more good refactoring possibilities:

Oct 6 2020, 6:34 PM

Oct 5 2020

seirl committed rDGRPH29ae6bf46d22: java: migrate to Junit 5 (authored by seirl).
java: migrate to Junit 5
Oct 5 2020, 11:31 PM
seirl closed D4146: java: migrate to Junit 5.
Oct 5 2020, 11:31 PM
seirl added inline comments to D4146: java: migrate to Junit 5.
Oct 5 2020, 7:44 PM
seirl created D4146: java: migrate to Junit 5.
Oct 5 2020, 7:43 PM
seirl committed rDGRPH8b18ec1cb26b: java: refactor AllowedEdges to remove its Graph attribute (authored by seirl).
java: refactor AllowedEdges to remove its Graph attribute
Oct 5 2020, 6:22 PM
seirl closed D4145: java: refactor AllowedEdges to remove its Graph attribute.
Oct 5 2020, 6:22 PM
seirl updated the diff for D4145: java: refactor AllowedEdges to remove its Graph attribute.

Fix BVGraph/ImmutableGraph implicit naming

Oct 5 2020, 6:22 PM
seirl created D4145: java: refactor AllowedEdges to remove its Graph attribute.
Oct 5 2020, 5:49 PM

Oct 3 2020

seirl added a comment to D4064: Early FUSE implementation, with support for blob and directory objects.

Yes, the correct method name is enter_context

Oct 3 2020, 2:47 PM

Sep 30 2020

seirl requested changes to D4064: Early FUSE implementation, with support for blob and directory objects.
Sep 30 2020, 7:36 PM

Sep 29 2020

seirl accepted D4006: WIP: add permissions on edge labels.
Sep 29 2020, 3:40 PM

Sep 24 2020

seirl committed rDGRPH2be4852de7e4: ConnectedComponents: only compute the size distribution, not the rest (authored by seirl).
ConnectedComponents: only compute the size distribution, not the rest
Sep 24 2020, 6:10 PM
seirl committed rDGRPH752154f3d3c1: ConnectedComponents: flush distribution printing (authored by seirl).
ConnectedComponents: flush distribution printing
Sep 24 2020, 3:25 PM
seirl committed rDGRPHafd7eac11431: experiments: add SubdatasetSizeFunction (authored by seirl).
experiments: add SubdatasetSizeFunction
Sep 24 2020, 3:25 PM
seirl committed rDGRPH9687de2de5ed: experiments: add InOutDegree experiment (authored by seirl).
experiments: add InOutDegree experiment
Sep 24 2020, 3:25 PM
seirl committed rDGRPH1dd5ef972954: Add TopologicalTraversal.java (authored by seirl).
Add TopologicalTraversal.java
Sep 24 2020, 3:25 PM

Sep 23 2020

seirl edited P774 Masterwork From Distant Lands.
Sep 23 2020, 3:03 PM
seirl updated the task description for T2633: Tighten restrictions on directory entry names.
Sep 23 2020, 2:22 PM · Data Model
seirl triaged T2633: Tighten restrictions on directory entry names as Normal priority.
Sep 23 2020, 2:21 PM · Data Model
seirl requested changes to D4006: WIP: add permissions on edge labels.
Sep 23 2020, 2:17 PM

Sep 22 2020

seirl added inline comments to D4006: WIP: add permissions on edge labels.
Sep 22 2020, 11:41 PM
seirl requested changes to D4006: WIP: add permissions on edge labels.

I would do a bit of refactoring:

Sep 22 2020, 5:37 PM

Sep 18 2020

seirl accepted D3990: Replace deprecated "SWH PID" naming with "SWHID".
Sep 18 2020, 3:52 PM
seirl created P772 (An Untitled Masterwork).
Sep 18 2020, 12:07 PM
seirl accepted D3944: test_cli.py: fix passing custom config to CLI.
Sep 18 2020, 11:54 AM
seirl requested changes to D3944: test_cli.py: fix passing custom config to CLI.
Sep 18 2020, 11:50 AM
seirl requested changes to D3944: test_cli.py: fix passing custom config to CLI.
Sep 18 2020, 11:40 AM

Sep 16 2020

seirl added a comment to T1847: fully automate export of the graph dataset.

No, only the edge part is done, we still need a parquet and a CSV exporter :/

Sep 16 2020, 10:59 PM · Compressed graph service, Datasets
seirl closed T1868: refresh compressed representation of the archive as Resolved.

It is already running on granet :-)

Sep 16 2020, 9:47 PM · Compressed graph service

Sep 15 2020

seirl created P768 For haltode :-).
Sep 15 2020, 2:28 PM
seirl added a comment to T2600: SQL storage: experiment with flattened layouts for directory nodes.

We considered three possibilities for the schema (assuming that we want to get rid of the three separate tables for dir_entries, rev_entries and file_entries -- otherwise, there's 6 possibilities).

Sep 15 2020, 1:40 PM · Storage manager

Sep 14 2020

seirl closed D3906: Add edge labelling prototype.
Sep 14 2020, 2:43 PM
seirl updated the diff for D3906: Add edge labelling prototype.

rebase

Sep 14 2020, 2:21 PM

Sep 10 2020

seirl created D3906: Add edge labelling prototype.
Sep 10 2020, 3:28 PM

Sep 4 2020

seirl committed rDGRPH0295a3be9458: java: rename MapBuilder to NodeMapBuilder (authored by seirl).
java: rename MapBuilder to NodeMapBuilder
Sep 4 2020, 2:17 PM
seirl committed rDGRPHc5034f8f433d: java: clustering coefficient: fix Graph api use (authored by seirl).
java: clustering coefficient: fix Graph api use
Sep 4 2020, 2:17 PM
seirl committed rDGRPH2563b17e7d7b: style: format java code (authored by seirl).
style: format java code
Sep 4 2020, 12:19 PM
seirl committed rDGRPH8fe92cb8d208: java: large refactor, move classes around and remove Neighbors (authored by seirl).
java: large refactor, move classes around and remove Neighbors
Sep 4 2020, 11:32 AM
seirl committed rDGRPHdf3ee47706e9: java: refactor Graph to extend ImmutableGraph (authored by seirl).
java: refactor Graph to extend ImmutableGraph
Sep 4 2020, 11:32 AM
seirl closed D3878: Large unreviewable swh-graph refactor.
Sep 4 2020, 11:32 AM

Sep 3 2020

seirl added inline comments to D3878: Large unreviewable swh-graph refactor.
Sep 3 2020, 10:36 PM
seirl updated the diff for D3878: Large unreviewable swh-graph refactor.

Fix useless dependency

Sep 3 2020, 10:35 PM
seirl created D3878: Large unreviewable swh-graph refactor.
Sep 3 2020, 10:18 PM
seirl accepted D3871: Add a short `quickstart` guide.

Thanks, this looks good!

Sep 3 2020, 3:32 PM

Sep 1 2020

seirl committed rDGRPH2bbfbd850067: java: update unimi dependencies (authored by seirl).
java: update unimi dependencies
Sep 1 2020, 5:54 PM

Aug 13 2020

seirl created P742 (An Untitled Masterwork).
Aug 13 2020, 12:34 AM
seirl created P741 (An Untitled Masterwork).
Aug 13 2020, 12:34 AM
seirl created P740 (An Untitled Masterwork).
Aug 13 2020, 12:34 AM

Aug 12 2020

seirl created P739 (An Untitled Masterwork).
Aug 12 2020, 2:42 PM

Aug 3 2020

seirl closed D3600: Add visit/edges endpoint.

Already landed

Aug 3 2020, 10:45 PM
seirl committed rDGRPH1d0106763699: java: bump maven-assembly-plugin version to 3.3.0 (authored by seirl).
java: bump maven-assembly-plugin version to 3.3.0
Aug 3 2020, 10:41 PM
seirl committed rDGRPH606a9a481f41: server: rewrite using class-based views (authored by seirl).
server: rewrite using class-based views
Aug 3 2020, 10:40 PM
seirl closed D3604: server: rewrite using class-based views.
Aug 3 2020, 10:40 PM
seirl updated the diff for D3604: server: rewrite using class-based views.

Rebase + add commit message

Aug 3 2020, 10:40 PM

Jul 27 2020

seirl updated the diff for D3600: Add visit/edges endpoint.
  • Remove debug prints
Jul 27 2020, 6:14 PM

Jul 23 2020

seirl abandoned D2629: dataset: add graph export based on kafka.

Obsoleted by D3011

Jul 23 2020, 4:58 PM
seirl renamed T2492: swh-graph: loading maps fail when available memory is too low: Cannot allocate memory from swh-graph: loading maps fail when swhgraphshm is running: Cannot allocate memory to swh-graph: loading maps fail when available memory is too low: Cannot allocate memory.
Jul 23 2020, 4:06 PM · Compressed graph service
seirl added a comment to T2492: swh-graph: loading maps fail when available memory is too low: Cannot allocate memory.

Just to be clear, the problem here wasn't directly linked to swhgraphshm but simply to the amount of available memory, because the MAP_PRIVATE flag tried to reserve all that memory to be able to perform copy on write. Using MAP_SHARED + PROT_READ avoids having this memory reservation and fixes the issue. swhgraphshm was just a random process taking a lot of the available ram, not specifically the reason why it failed.

Jul 23 2020, 4:06 PM · Compressed graph service
seirl added a comment to D3600: Add visit/edges endpoint.

Fixed and added a test for that problem.

Jul 23 2020, 3:45 PM