Page MenuHomeSoftware Heritage

seirl (Antoine Pietri)
User

User Details

User Since
Feb 2 2017, 11:38 AM (190 w, 5 d)

Recent Activity

Today

seirl accepted D4006: WIP: add permissions on edge labels.
Tue, Sep 29, 3:40 PM

Thu, Sep 24

seirl committed rDGRPH2be4852de7e4: ConnectedComponents: only compute the size distribution, not the rest (authored by seirl).
ConnectedComponents: only compute the size distribution, not the rest
Thu, Sep 24, 6:10 PM
seirl committed rDGRPH752154f3d3c1: ConnectedComponents: flush distribution printing (authored by seirl).
ConnectedComponents: flush distribution printing
Thu, Sep 24, 3:25 PM
seirl committed rDGRPHafd7eac11431: experiments: add SubdatasetSizeFunction (authored by seirl).
experiments: add SubdatasetSizeFunction
Thu, Sep 24, 3:25 PM
seirl committed rDGRPH9687de2de5ed: experiments: add InOutDegree experiment (authored by seirl).
experiments: add InOutDegree experiment
Thu, Sep 24, 3:25 PM
seirl committed rDGRPH1dd5ef972954: Add TopologicalTraversal.java (authored by seirl).
Add TopologicalTraversal.java
Thu, Sep 24, 3:25 PM

Wed, Sep 23

seirl edited P774 Masterwork From Distant Lands.
Wed, Sep 23, 3:03 PM
seirl updated the task description for T2633: Tighten restrictions on directory entry names.
Wed, Sep 23, 2:22 PM · Data Model
seirl triaged T2633: Tighten restrictions on directory entry names as Normal priority.
Wed, Sep 23, 2:21 PM · Data Model
seirl requested changes to D4006: WIP: add permissions on edge labels.
Wed, Sep 23, 2:17 PM

Tue, Sep 22

seirl added inline comments to D4006: WIP: add permissions on edge labels.
Tue, Sep 22, 11:41 PM
seirl requested changes to D4006: WIP: add permissions on edge labels.

I would do a bit of refactoring:

Tue, Sep 22, 5:37 PM

Fri, Sep 18

seirl accepted D3990: Replace deprecated "SWH PID" naming with "SWHID".
Fri, Sep 18, 3:52 PM
seirl created P772 (An Untitled Masterwork).
Fri, Sep 18, 12:07 PM
seirl accepted D3944: test_cli.py: fix passing custom config to CLI.
Fri, Sep 18, 11:54 AM
seirl requested changes to D3944: test_cli.py: fix passing custom config to CLI.
Fri, Sep 18, 11:50 AM
seirl requested changes to D3944: test_cli.py: fix passing custom config to CLI.
Fri, Sep 18, 11:40 AM

Wed, Sep 16

seirl added a comment to T1847: fully automate export of the graph dataset.

No, only the edge part is done, we still need a parquet and a CSV exporter :/

Wed, Sep 16, 10:59 PM · Graph service, Datasets
seirl closed T1868: refresh compressed representation of the archive as Resolved.

It is already running on granet :-)

Wed, Sep 16, 9:47 PM · Graph service

Tue, Sep 15

seirl created P768 For haltode :-).
Tue, Sep 15, 2:28 PM
seirl added a comment to T2600: SQL storage: experiment with flattened layouts for directory nodes.

We considered three possibilities for the schema (assuming that we want to get rid of the three separate tables for dir_entries, rev_entries and file_entries -- otherwise, there's 6 possibilities).

Tue, Sep 15, 1:40 PM · Storage manager

Mon, Sep 14

seirl closed D3906: Add edge labelling prototype.
Mon, Sep 14, 2:43 PM
seirl updated the diff for D3906: Add edge labelling prototype.

rebase

Mon, Sep 14, 2:21 PM

Thu, Sep 10

seirl created D3906: Add edge labelling prototype.
Thu, Sep 10, 3:28 PM

Fri, Sep 4

seirl committed rDGRPH0295a3be9458: java: rename MapBuilder to NodeMapBuilder (authored by seirl).
java: rename MapBuilder to NodeMapBuilder
Fri, Sep 4, 2:17 PM
seirl committed rDGRPHc5034f8f433d: java: clustering coefficient: fix Graph api use (authored by seirl).
java: clustering coefficient: fix Graph api use
Fri, Sep 4, 2:17 PM
seirl committed rDGRPH2563b17e7d7b: style: format java code (authored by seirl).
style: format java code
Fri, Sep 4, 12:19 PM
seirl committed rDGRPH8fe92cb8d208: java: large refactor, move classes around and remove Neighbors (authored by seirl).
java: large refactor, move classes around and remove Neighbors
Fri, Sep 4, 11:32 AM
seirl committed rDGRPHdf3ee47706e9: java: refactor Graph to extend ImmutableGraph (authored by seirl).
java: refactor Graph to extend ImmutableGraph
Fri, Sep 4, 11:32 AM
seirl closed D3878: Large unreviewable swh-graph refactor.
Fri, Sep 4, 11:32 AM

Thu, Sep 3

seirl added inline comments to D3878: Large unreviewable swh-graph refactor.
Thu, Sep 3, 10:36 PM
seirl updated the diff for D3878: Large unreviewable swh-graph refactor.

Fix useless dependency

Thu, Sep 3, 10:35 PM
seirl created D3878: Large unreviewable swh-graph refactor.
Thu, Sep 3, 10:18 PM
seirl accepted D3871: Add a short `quickstart` guide.

Thanks, this looks good!

Thu, Sep 3, 3:32 PM

Tue, Sep 1

seirl committed rDGRPH2bbfbd850067: java: update unimi dependencies (authored by seirl).
java: update unimi dependencies
Tue, Sep 1, 5:54 PM

Aug 13 2020

seirl created P742 (An Untitled Masterwork).
Aug 13 2020, 12:34 AM
seirl created P741 (An Untitled Masterwork).
Aug 13 2020, 12:34 AM
seirl created P740 (An Untitled Masterwork).
Aug 13 2020, 12:34 AM

Aug 12 2020

seirl created P739 (An Untitled Masterwork).
Aug 12 2020, 2:42 PM

Aug 3 2020

seirl closed D3600: Add visit/edges endpoint.

Already landed

Aug 3 2020, 10:45 PM
seirl committed rDGRPH1d0106763699: java: bump maven-assembly-plugin version to 3.3.0 (authored by seirl).
java: bump maven-assembly-plugin version to 3.3.0
Aug 3 2020, 10:41 PM
seirl committed rDGRPH606a9a481f41: server: rewrite using class-based views (authored by seirl).
server: rewrite using class-based views
Aug 3 2020, 10:40 PM
seirl closed D3604: server: rewrite using class-based views.
Aug 3 2020, 10:40 PM
seirl updated the diff for D3604: server: rewrite using class-based views.

Rebase + add commit message

Aug 3 2020, 10:40 PM

Jul 27 2020

seirl updated the diff for D3600: Add visit/edges endpoint.
  • Remove debug prints
Jul 27 2020, 6:14 PM

Jul 23 2020

seirl abandoned D2629: dataset: add graph export based on kafka.

Obsoleted by D3011

Jul 23 2020, 4:58 PM
seirl renamed T2492: swh-graph: loading maps fail when available memory is too low: Cannot allocate memory from swh-graph: loading maps fail when swhgraphshm is running: Cannot allocate memory to swh-graph: loading maps fail when available memory is too low: Cannot allocate memory.
Jul 23 2020, 4:06 PM · Graph service
seirl added a comment to T2492: swh-graph: loading maps fail when available memory is too low: Cannot allocate memory.

Just to be clear, the problem here wasn't directly linked to swhgraphshm but simply to the amount of available memory, because the MAP_PRIVATE flag tried to reserve all that memory to be able to perform copy on write. Using MAP_SHARED + PROT_READ avoids having this memory reservation and fixes the issue. swhgraphshm was just a random process taking a lot of the available ram, not specifically the reason why it failed.

Jul 23 2020, 4:06 PM · Graph service
seirl added a comment to D3600: Add visit/edges endpoint.

Fixed and added a test for that problem.

Jul 23 2020, 3:45 PM
seirl updated the diff for D3600: Add visit/edges endpoint.
  • visit/edges: fix incorrect handling of diamond pattern
Jul 23 2020, 3:45 PM
seirl planned changes to D3600: Add visit/edges endpoint.
Jul 23 2020, 3:21 PM
seirl added a comment to D3604: server: rewrite using class-based views.

So it turns out that using multiprocessing to spawn a web app that we want to test isn't exactly great coverage... I'll check what I can do about that.

Jul 23 2020, 3:19 PM
seirl created D3604: server: rewrite using class-based views.
Jul 23 2020, 3:11 PM
seirl created D3600: Add visit/edges endpoint.
Jul 23 2020, 1:18 PM
seirl committed rDGRPH39430074227c: pid: use PROT_READ/MAP_SHARED for readonly maps (authored by seirl).
pid: use PROT_READ/MAP_SHARED for readonly maps
Jul 23 2020, 12:32 PM
seirl closed T2492: swh-graph: loading maps fail when available memory is too low: Cannot allocate memory as Resolved by committing rDGRPH39430074227c: pid: use PROT_READ/MAP_SHARED for readonly maps.
Jul 23 2020, 12:32 PM · Graph service
seirl closed D3599: pid: use PROT_READ/MAP_SHARED for readonly maps.
Jul 23 2020, 12:32 PM
seirl added a revision to T2492: swh-graph: loading maps fail when available memory is too low: Cannot allocate memory: D3599: pid: use PROT_READ/MAP_SHARED for readonly maps.
Jul 23 2020, 12:22 PM · Graph service
seirl created D3599: pid: use PROT_READ/MAP_SHARED for readonly maps.
Jul 23 2020, 12:22 PM

Jul 6 2020

seirl committed rDGRPH481fcc9fc026: generate_graph.sh: fix mix up between nodes and edges (authored by seirl).
generate_graph.sh: fix mix up between nodes and edges
Jul 6 2020, 11:35 AM

Jun 9 2020

seirl created P693 (An Untitled Masterwork).
Jun 9 2020, 2:55 PM
seirl created P692 graph export 2020-05-20 - statistics.
Jun 9 2020, 2:22 PM

Jun 8 2020

seirl committed rDDATASET8d79b8475d50: graph export: compute node/edge type stats (authored by seirl).
graph export: compute node/edge type stats
Jun 8 2020, 4:46 PM
seirl closed D3242: graph export: compute node/edge type stats.
Jun 8 2020, 4:46 PM
seirl added a reviewer for D3242: graph export: compute node/edge type stats: Reviewers.
Jun 8 2020, 4:04 PM
seirl created P689 (An Untitled Masterwork).
Jun 8 2020, 3:04 PM
seirl created D3242: graph export: compute node/edge type stats.
Jun 8 2020, 2:43 PM

Jun 3 2020

seirl triaged T2431: Document how to export the graph edge dataset as Normal priority.
Jun 3 2020, 4:34 PM · Documentation, Graph service, Datasets
seirl closed T1796: Datasets exported from Spark are missing some rows as Resolved.

We no longer export edges from Spark

Jun 3 2020, 4:14 PM · Datasets
seirl closed T1741: graph dataset: update to use persistent identifiers everywhere, a subtask of T1848: refresh graph dataset export, as Resolved.
Jun 3 2020, 4:08 PM · Datasets
seirl closed T1741: graph dataset: update to use persistent identifiers everywhere as Resolved.

We no longer export edges per file type.

Jun 3 2020, 4:08 PM · Datasets
seirl closed T1956: Integrate usage docs of the graph dataset in swh-docs as Resolved.
Jun 3 2020, 4:07 PM · Datasets

May 29 2020

seirl committed rDGRPHc5b0a152b78a: java: GenDistribution: shutdown service pool when all the threads are done (authored by seirl).
java: GenDistribution: shutdown service pool when all the threads are done
May 29 2020, 2:29 PM

May 28 2020

seirl created P683 (An Untitled Masterwork).
May 28 2020, 4:12 PM

May 15 2020

seirl committed rDDATASETd26a4247094d: exporter: increase message.max.bytes value to allow large messages (authored by seirl).
exporter: increase message.max.bytes value to allow large messages
May 15 2020, 2:34 PM
seirl created P673 (An Untitled Masterwork).
May 15 2020, 12:13 PM

May 14 2020

seirl committed rDDATASET312474e905b6: graph exporter: fix process arguments, run process on content (authored by seirl).
graph exporter: fix process arguments, run process on content
May 14 2020, 11:06 PM
seirl committed rDDATASET518973396296: exporter: send EOF token to progress queue when process() finishes normally (authored by seirl).
exporter: send EOF token to progress queue when process() finishes normally
May 14 2020, 11:06 PM
seirl committed rDDATASETb608f41411ee: graph exporter: more exhaustive heuristic to remove pull requests (authored by seirl).
graph exporter: more exhaustive heuristic to remove pull requests
May 14 2020, 10:29 PM

May 5 2020

seirl committed rDDATASET56fee87710ce: graph: do not deduplicate different visits from the same origin (authored by seirl).
graph: do not deduplicate different visits from the same origin
May 5 2020, 6:50 PM
seirl committed rDDATASET7a34fb115d38: graph: use an sqlite3 on-disk set to avoid processing nodes twice (authored by seirl).
graph: use an sqlite3 on-disk set to avoid processing nodes twice
May 5 2020, 6:50 PM
seirl closed D3121: graph: use an sqlite3 on-disk set to avoid processing nodes twice.
May 5 2020, 6:50 PM
seirl updated the diff for D3121: graph: use an sqlite3 on-disk set to avoid processing nodes twice.
  • rebase
  • graph: do not deduplicate different visits from the same origin
May 5 2020, 6:43 PM
seirl added inline comments to D3121: graph: use an sqlite3 on-disk set to avoid processing nodes twice.
May 5 2020, 5:04 PM
seirl closed D3011: dataset: add graph export based on kafka.

Don't know why this hasn't been autoclosed, but it's merged in master.

May 5 2020, 11:24 AM

May 4 2020

seirl created D3121: graph: use an sqlite3 on-disk set to avoid processing nodes twice.
May 4 2020, 9:48 PM
seirl committed rDDATASET1796e402370d: Add Black pre-commit (authored by seirl).
Add Black pre-commit
May 4 2020, 7:23 PM
seirl updated the diff for D3011: dataset: add graph export based on kafka.

Rebase and update against swh-storage/master

May 4 2020, 7:18 PM
seirl updated the diff for D3011: dataset: add graph export based on kafka.
  • tests/graph: check the presence of duplicate nodes
  • exporter: minor style changes
May 4 2020, 7:09 PM
seirl added inline comments to D3011: dataset: add graph export based on kafka.
May 4 2020, 7:07 PM
seirl added inline comments to D3011: dataset: add graph export based on kafka.
May 4 2020, 7:05 PM

Apr 21 2020

seirl updated the diff for D3011: dataset: add graph export based on kafka.
  • Rework graph export pipeline
  • Graph export: add unit tests
Apr 21 2020, 7:41 PM
seirl requested review of D3011: dataset: add graph export based on kafka.
Apr 21 2020, 7:40 PM
seirl committed rDGRPH387ce5e2ff0c: cachemount: only cache .graph, not .obl/.offset files (authored by seirl).
cachemount: only cache .graph, not .obl/.offset files
Apr 21 2020, 4:51 PM

Apr 17 2020

seirl created P649 (An Untitled Masterwork).
Apr 17 2020, 4:34 PM
seirl accepted D3026: Support serialization and deserialization of ints of arbitrary length.
Apr 17 2020, 10:52 AM

Apr 16 2020

seirl accepted D3023: sphinx: add support to generate click CLI doc.
Apr 16 2020, 4:21 PM

Apr 15 2020

seirl edited P648 Masterwork From Distant Lands.
Apr 15 2020, 3:57 PM
seirl closed T2361: WARNING:swh.core.cli:Could not load subcommand dataset: No module named 'swh.dataset.cli' as Resolved.
Apr 15 2020, 3:36 PM · Datasets
seirl added a comment to T2361: WARNING:swh.core.cli:Could not load subcommand dataset: No module named 'swh.dataset.cli'.

Temporary fix here until the branch that implements this entrypoint is merged: https://forge.softwareheritage.org/rDDATASETbe9e71ba1f858bbb8f44649306b919a1fa965ea2

Apr 15 2020, 3:36 PM · Datasets