Page MenuHomeSoftware Heritage
Feed Advanced Search

Dec 11 2020

seirl committed rDDATASETf1952316a1ea: Graph export: add labels to the export CSV format (authored by seirl).
Graph export: add labels to the export CSV format
Dec 11 2020, 5:38 PM
seirl closed D4707: graph export: handle labels.
Dec 11 2020, 5:38 PM
seirl updated the diff for D4707: graph export: handle labels.

Better commit message:

Dec 11 2020, 5:38 PM

Dec 10 2020

seirl added a reviewer for D4718: Rewrite of the export pipeline using Exporters: Reviewers.
Dec 10 2020, 9:24 PM
seirl created D4718: Rewrite of the export pipeline using Exporters.
Dec 10 2020, 7:38 PM

Dec 9 2020

seirl added a reviewer for D4707: graph export: handle labels: Reviewers.
Dec 9 2020, 7:05 PM
seirl created D4707: graph export: handle labels.
Dec 9 2020, 7:05 PM
seirl committed rDDATASETb21d4a5ca327: graph exporter: schema upgrade for origin_visit_status (authored by seirl).
graph exporter: schema upgrade for origin_visit_status
Dec 9 2020, 7:04 PM
seirl closed D4691: graph exporter: schema upgrade for origin_visit_status.
Dec 9 2020, 7:04 PM

Dec 8 2020

seirl updated the diff for D4691: graph exporter: schema upgrade for origin_visit_status.

Subscribe to the correct objects

Dec 8 2020, 6:25 PM
seirl updated the diff for D4691: graph exporter: schema upgrade for origin_visit_status.

Fix variable name

Dec 8 2020, 5:28 PM
seirl added a reviewer for D4691: graph exporter: schema upgrade for origin_visit_status: Reviewers.
Dec 8 2020, 5:23 PM
seirl created D4691: graph exporter: schema upgrade for origin_visit_status.
Dec 8 2020, 5:21 PM
seirl accepted D4689: FUSE: fs: lookup: add optional regexp name validation.
Dec 8 2020, 4:55 PM
seirl created P896 (An Untitled Masterwork).
Dec 8 2020, 4:26 PM
seirl accepted D4682: FUSE: fix directory listing bugs.
Dec 8 2020, 4:09 PM
seirl added a comment to T2863: FUSE: lookup: add optional regex pre-condition.

My API idea was to simply have something like ENTRIES_REGEXP = r'^.*:.*$' as a class attribute of each type of directory, and a validate_entry(self, name: str) method which, by default, just checks that it matches the regexp.

Dec 8 2020, 11:59 AM · Software Heritage filesystem

Dec 3 2020

seirl added a comment to T2771: FUSE: rethink the visibility of files under archive/ and meta/, and possibly add a new cache/ entrypoint.

We also need to discuss what exactly we put in cache/. I thought about symlinks to archive/ and meta/, what do you think? Removing the symlinks also means removing the data from the cache.

Dec 3 2020, 1:44 PM · Software Heritage filesystem

Dec 2 2020

seirl accepted D4632: FUSE: tests: various code cleanup.
Dec 2 2020, 3:14 PM
seirl added inline comments to D4632: FUSE: tests: various code cleanup.
Dec 2 2020, 1:44 PM

Dec 1 2020

seirl accepted D4631: fs: snapshot: nest branch names as directories instead of URL-escaping.
Dec 1 2020, 5:35 PM

Nov 27 2020

seirl accepted D4569: FUSE: cache: add 'date' column in metadata_cache for history/by-date.
Nov 27 2020, 2:16 PM

Nov 25 2020

seirl accepted D4583: fuse: add support for origin artifacts.
Nov 25 2020, 1:59 PM

Nov 20 2020

seirl triaged T2801: Wrong <title> on snapshot pages as Normal priority.
Nov 20 2020, 9:01 PM · Web app, Easy hack

Nov 19 2020

seirl added inline comments to D4509: fs: history: clean sharded dir implementation.
Nov 19 2020, 2:39 PM

Nov 18 2020

seirl accepted D4509: fs: history: clean sharded dir implementation.
Nov 18 2020, 5:57 PM
seirl accepted D4489: fs: history: add by-date/ sharded directory.
Nov 18 2020, 12:49 PM

Nov 16 2020

seirl accepted D4476: fs: history: add by-page/ sharded directory.
Nov 16 2020, 2:44 PM
seirl accepted D4476: fs: history: add by-page/ sharded directory.
Nov 16 2020, 2:27 PM
seirl accepted D4478: fuse: use logging.exception() instead of .debug().
Nov 16 2020, 1:27 PM

Nov 13 2020

seirl accepted D4416: fs: history: add by-hash/ sharded directory.
Nov 13 2020, 4:03 PM

Nov 12 2020

seirl added a comment to D4416: fs: history: add by-hash/ sharded directory.

I think I understand what your fill_direntry_cache function is trying to do: you want to avoid fetching the history multiple times by doing the request only once and writing the direntry cache of all the children recursively?
Would it be maybe better to instead have a small LRU cache for the API queries, and keep the direntry code simple and fully lazy?

Nov 12 2020, 9:17 PM

Nov 5 2020

seirl added inline comments to D4416: fs: history: add by-hash/ sharded directory.
Nov 5 2020, 3:01 PM

Nov 4 2020

seirl accepted D4345: fuse: add cache on directories entries.
Nov 4 2020, 5:00 PM
seirl added inline comments to D4345: fuse: add cache on directories entries.
Nov 4 2020, 2:42 PM
seirl requested changes to D4345: fuse: add cache on directories entries.

Looks good apart from two small things.

Nov 4 2020, 2:41 PM

Nov 3 2020

seirl requested changes to D4345: fuse: add cache on directories entries.

One thing I don't really like here is that FuseEntries cannot easily list their own entries easily using the cache when available. I would much rather have the cache logic moved inside FuseEntry like what we discussed.

Nov 3 2020, 7:47 PM

Oct 22 2020

seirl accepted D4309: Add flat commit view in a history/ virtual dir.
Oct 22 2020, 5:32 PM

Oct 21 2020

seirl accepted D4316: cache: add missing aiosqlite commit call.
Oct 21 2020, 12:50 PM

Oct 16 2020

seirl accepted D4289: cli: fix daemon working directory.
Oct 16 2020, 3:28 PM
seirl created P825 (An Untitled Masterwork).
Oct 16 2020, 2:37 PM

Oct 14 2020

seirl accepted D4254: fs: add FuseEntry sub-classes for file, dir, symlink.
Oct 14 2020, 2:57 PM

Oct 13 2020

seirl accepted D4246: fuse: add support for release artifacts.
Oct 13 2020, 4:34 PM
seirl requested changes to D4246: fuse: add support for release artifacts.
Oct 13 2020, 3:31 PM
seirl added a comment to T2695: Cache directory entries to make readdir/lookup more efficient.

lookup() should ideally be O(1).

Oct 13 2020, 12:13 PM · Software Heritage filesystem
seirl accepted D4240: fuse: allow mounting artifacts on the fly.
Oct 13 2020, 12:11 PM

Oct 12 2020

seirl accepted D4235: Rework unit testing framework and add more tests.
Oct 12 2020, 6:30 PM
seirl requested changes to D4235: Rework unit testing framework and add more tests.
Oct 12 2020, 4:14 PM

Oct 9 2020

seirl accepted D4200: Add support for revision artifacts.
Oct 9 2020, 2:54 PM
seirl requested changes to D4200: Add support for revision artifacts.
Oct 9 2020, 2:36 PM

Oct 8 2020

seirl accepted D4201: Fix pytest warnings - tests: add missing join() after subprocess.run().
Oct 8 2020, 2:44 PM

Oct 7 2020

seirl accepted D4064: Early FUSE implementation, with support for blob and directory objects.
Oct 7 2020, 3:06 PM
seirl accepted D4028: Add Spotless formatting tool.
Oct 7 2020, 1:11 PM

Oct 6 2020

seirl requested changes to D4064: Early FUSE implementation, with support for blob and directory objects.

This is looking pretty great. I see three more good refactoring possibilities:

Oct 6 2020, 6:34 PM

Oct 5 2020

seirl committed rDGRPH29ae6bf46d22: java: migrate to Junit 5 (authored by seirl).
java: migrate to Junit 5
Oct 5 2020, 11:31 PM
seirl closed D4146: java: migrate to Junit 5.
Oct 5 2020, 11:31 PM
seirl added inline comments to D4146: java: migrate to Junit 5.
Oct 5 2020, 7:44 PM
seirl created D4146: java: migrate to Junit 5.
Oct 5 2020, 7:43 PM
seirl committed rDGRPH8b18ec1cb26b: java: refactor AllowedEdges to remove its Graph attribute (authored by seirl).
java: refactor AllowedEdges to remove its Graph attribute
Oct 5 2020, 6:22 PM
seirl closed D4145: java: refactor AllowedEdges to remove its Graph attribute.
Oct 5 2020, 6:22 PM
seirl updated the diff for D4145: java: refactor AllowedEdges to remove its Graph attribute.

Fix BVGraph/ImmutableGraph implicit naming

Oct 5 2020, 6:22 PM
seirl created D4145: java: refactor AllowedEdges to remove its Graph attribute.
Oct 5 2020, 5:49 PM

Oct 3 2020

seirl added a comment to D4064: Early FUSE implementation, with support for blob and directory objects.

Yes, the correct method name is enter_context

Oct 3 2020, 2:47 PM

Sep 30 2020

seirl requested changes to D4064: Early FUSE implementation, with support for blob and directory objects.
Sep 30 2020, 7:36 PM

Sep 29 2020

seirl accepted D4006: WIP: add permissions on edge labels.
Sep 29 2020, 3:40 PM

Sep 24 2020

seirl committed rDGRPH2be4852de7e4: ConnectedComponents: only compute the size distribution, not the rest (authored by seirl).
ConnectedComponents: only compute the size distribution, not the rest
Sep 24 2020, 6:10 PM
seirl committed rDGRPH752154f3d3c1: ConnectedComponents: flush distribution printing (authored by seirl).
ConnectedComponents: flush distribution printing
Sep 24 2020, 3:25 PM
seirl committed rDGRPHafd7eac11431: experiments: add SubdatasetSizeFunction (authored by seirl).
experiments: add SubdatasetSizeFunction
Sep 24 2020, 3:25 PM
seirl committed rDGRPH9687de2de5ed: experiments: add InOutDegree experiment (authored by seirl).
experiments: add InOutDegree experiment
Sep 24 2020, 3:25 PM
seirl committed rDGRPH1dd5ef972954: Add TopologicalTraversal.java (authored by seirl).
Add TopologicalTraversal.java
Sep 24 2020, 3:25 PM

Sep 23 2020

seirl edited P774 Masterwork From Distant Lands.
Sep 23 2020, 3:03 PM
seirl updated the task description for T2633: Tighten restrictions on directory entry names.
Sep 23 2020, 2:22 PM · Data Model
seirl triaged T2633: Tighten restrictions on directory entry names as Normal priority.
Sep 23 2020, 2:21 PM · Data Model
seirl requested changes to D4006: WIP: add permissions on edge labels.
Sep 23 2020, 2:17 PM

Sep 22 2020

seirl added inline comments to D4006: WIP: add permissions on edge labels.
Sep 22 2020, 11:41 PM
seirl requested changes to D4006: WIP: add permissions on edge labels.

I would do a bit of refactoring:

Sep 22 2020, 5:37 PM

Sep 18 2020

seirl accepted D3990: Replace deprecated "SWH PID" naming with "SWHID".
Sep 18 2020, 3:52 PM
seirl created P772 (An Untitled Masterwork).
Sep 18 2020, 12:07 PM
seirl accepted D3944: test_cli.py: fix passing custom config to CLI.
Sep 18 2020, 11:54 AM
seirl requested changes to D3944: test_cli.py: fix passing custom config to CLI.
Sep 18 2020, 11:50 AM
seirl requested changes to D3944: test_cli.py: fix passing custom config to CLI.
Sep 18 2020, 11:40 AM

Sep 16 2020

seirl added a comment to T1847: fully automate export of the graph dataset.

No, only the edge part is done, we still need a parquet and a CSV exporter :/

Sep 16 2020, 10:59 PM · Compressed graph service, Datasets
seirl closed T1868: refresh compressed representation of the archive as Resolved.

It is already running on granet :-)

Sep 16 2020, 9:47 PM · Compressed graph service

Sep 15 2020

seirl created P768 For haltode :-).
Sep 15 2020, 2:28 PM
seirl added a comment to T2600: SQL storage: experiment with flattened layouts for directory nodes.

We considered three possibilities for the schema (assuming that we want to get rid of the three separate tables for dir_entries, rev_entries and file_entries -- otherwise, there's 6 possibilities).

Sep 15 2020, 1:40 PM · Storage manager

Sep 14 2020

seirl closed D3906: Add edge labelling prototype.
Sep 14 2020, 2:43 PM
seirl updated the diff for D3906: Add edge labelling prototype.

rebase

Sep 14 2020, 2:21 PM

Sep 10 2020

seirl created D3906: Add edge labelling prototype.
Sep 10 2020, 3:28 PM

Sep 4 2020

seirl committed rDGRPH0295a3be9458: java: rename MapBuilder to NodeMapBuilder (authored by seirl).
java: rename MapBuilder to NodeMapBuilder
Sep 4 2020, 2:17 PM
seirl committed rDGRPHc5034f8f433d: java: clustering coefficient: fix Graph api use (authored by seirl).
java: clustering coefficient: fix Graph api use
Sep 4 2020, 2:17 PM
seirl committed rDGRPH2563b17e7d7b: style: format java code (authored by seirl).
style: format java code
Sep 4 2020, 12:19 PM
seirl committed rDGRPH8fe92cb8d208: java: large refactor, move classes around and remove Neighbors (authored by seirl).
java: large refactor, move classes around and remove Neighbors
Sep 4 2020, 11:32 AM
seirl committed rDGRPHdf3ee47706e9: java: refactor Graph to extend ImmutableGraph (authored by seirl).
java: refactor Graph to extend ImmutableGraph
Sep 4 2020, 11:32 AM
seirl closed D3878: Large unreviewable swh-graph refactor.
Sep 4 2020, 11:32 AM

Sep 3 2020

seirl added inline comments to D3878: Large unreviewable swh-graph refactor.
Sep 3 2020, 10:36 PM
seirl updated the diff for D3878: Large unreviewable swh-graph refactor.

Fix useless dependency

Sep 3 2020, 10:35 PM
seirl created D3878: Large unreviewable swh-graph refactor.
Sep 3 2020, 10:18 PM
seirl accepted D3871: Add a short `quickstart` guide.

Thanks, this looks good!

Sep 3 2020, 3:32 PM

Sep 1 2020

seirl committed rDGRPH2bbfbd850067: java: update unimi dependencies (authored by seirl).
java: update unimi dependencies
Sep 1 2020, 5:54 PM

Aug 13 2020

seirl created P742 (An Untitled Masterwork).
Aug 13 2020, 12:34 AM