Page MenuHomeSoftware Heritage

seirl (Antoine Pietri)
User

User Details

User Since
Feb 2 2017, 11:38 AM (179 w, 1 d)

Recent Activity

Mon, Jul 6

seirl committed rDGRPH481fcc9fc026: generate_graph.sh: fix mix up between nodes and edges (authored by seirl).
generate_graph.sh: fix mix up between nodes and edges
Mon, Jul 6, 11:35 AM

Jun 9 2020

seirl created P693 (An Untitled Masterwork).
Jun 9 2020, 2:55 PM
seirl created P692 graph export 2020-05-20 - statistics.
Jun 9 2020, 2:22 PM

Jun 8 2020

seirl committed rDDATASET8d79b8475d50: graph export: compute node/edge type stats (authored by seirl).
graph export: compute node/edge type stats
Jun 8 2020, 4:46 PM
seirl closed D3242: graph export: compute node/edge type stats.
Jun 8 2020, 4:46 PM
seirl added a reviewer for D3242: graph export: compute node/edge type stats: Reviewers.
Jun 8 2020, 4:04 PM
seirl created P689 (An Untitled Masterwork).
Jun 8 2020, 3:04 PM
seirl created D3242: graph export: compute node/edge type stats.
Jun 8 2020, 2:43 PM

Jun 3 2020

seirl triaged T2431: Document how to export the graph edge dataset as Normal priority.
Jun 3 2020, 4:34 PM · Development documentation, Graph service, Datasets
seirl closed T1796: Datasets exported from Spark are missing some rows as Resolved.

We no longer export edges from Spark

Jun 3 2020, 4:14 PM · Datasets
seirl closed T1741: graph dataset: update to use persistent identifiers everywhere, a subtask of T1848: refresh graph dataset export, as Resolved.
Jun 3 2020, 4:08 PM · Datasets
seirl closed T1741: graph dataset: update to use persistent identifiers everywhere as Resolved.

We no longer export edges per file type.

Jun 3 2020, 4:08 PM · Datasets
seirl closed T1956: Integrate usage docs of the graph dataset in swh-docs as Resolved.
Jun 3 2020, 4:07 PM · Datasets

May 29 2020

seirl committed rDGRPHc5b0a152b78a: java: GenDistribution: shutdown service pool when all the threads are done (authored by seirl).
java: GenDistribution: shutdown service pool when all the threads are done
May 29 2020, 2:29 PM

May 28 2020

seirl created P683 (An Untitled Masterwork).
May 28 2020, 4:12 PM

May 15 2020

seirl committed rDDATASETd26a4247094d: exporter: increase message.max.bytes value to allow large messages (authored by seirl).
exporter: increase message.max.bytes value to allow large messages
May 15 2020, 2:34 PM
seirl created P673 (An Untitled Masterwork).
May 15 2020, 12:13 PM

May 14 2020

seirl committed rDDATASET312474e905b6: graph exporter: fix process arguments, run process on content (authored by seirl).
graph exporter: fix process arguments, run process on content
May 14 2020, 11:06 PM
seirl committed rDDATASET518973396296: exporter: send EOF token to progress queue when process() finishes normally (authored by seirl).
exporter: send EOF token to progress queue when process() finishes normally
May 14 2020, 11:06 PM
seirl committed rDDATASETb608f41411ee: graph exporter: more exhaustive heuristic to remove pull requests (authored by seirl).
graph exporter: more exhaustive heuristic to remove pull requests
May 14 2020, 10:29 PM

May 5 2020

seirl committed rDDATASET56fee87710ce: graph: do not deduplicate different visits from the same origin (authored by seirl).
graph: do not deduplicate different visits from the same origin
May 5 2020, 6:50 PM
seirl committed rDDATASET7a34fb115d38: graph: use an sqlite3 on-disk set to avoid processing nodes twice (authored by seirl).
graph: use an sqlite3 on-disk set to avoid processing nodes twice
May 5 2020, 6:50 PM
seirl closed D3121: graph: use an sqlite3 on-disk set to avoid processing nodes twice.
May 5 2020, 6:50 PM
seirl updated the diff for D3121: graph: use an sqlite3 on-disk set to avoid processing nodes twice.
  • rebase
  • graph: do not deduplicate different visits from the same origin
May 5 2020, 6:43 PM
seirl added inline comments to D3121: graph: use an sqlite3 on-disk set to avoid processing nodes twice.
May 5 2020, 5:04 PM
seirl closed D3011: dataset: add graph export based on kafka.

Don't know why this hasn't been autoclosed, but it's merged in master.

May 5 2020, 11:24 AM

May 4 2020

seirl created D3121: graph: use an sqlite3 on-disk set to avoid processing nodes twice.
May 4 2020, 9:48 PM
seirl committed rDDATASET1796e402370d: Add Black pre-commit (authored by seirl).
Add Black pre-commit
May 4 2020, 7:23 PM
seirl updated the diff for D3011: dataset: add graph export based on kafka.

Rebase and update against swh-storage/master

May 4 2020, 7:18 PM
seirl updated the diff for D3011: dataset: add graph export based on kafka.
  • tests/graph: check the presence of duplicate nodes
  • exporter: minor style changes
May 4 2020, 7:09 PM
seirl added inline comments to D3011: dataset: add graph export based on kafka.
May 4 2020, 7:07 PM
seirl added inline comments to D3011: dataset: add graph export based on kafka.
May 4 2020, 7:05 PM

Apr 21 2020

seirl updated the diff for D3011: dataset: add graph export based on kafka.
  • Rework graph export pipeline
  • Graph export: add unit tests
Apr 21 2020, 7:41 PM
seirl requested review of D3011: dataset: add graph export based on kafka.
Apr 21 2020, 7:40 PM
seirl committed rDGRPH387ce5e2ff0c: cachemount: only cache .graph, not .obl/.offset files (authored by seirl).
cachemount: only cache .graph, not .obl/.offset files
Apr 21 2020, 4:51 PM

Apr 17 2020

seirl created P649 (An Untitled Masterwork).
Apr 17 2020, 4:34 PM
seirl accepted D3026: Support serialization and deserialization of ints of arbitrary length.
Apr 17 2020, 10:52 AM

Apr 16 2020

seirl accepted D3023: sphinx: add support to generate click CLI doc.
Apr 16 2020, 4:21 PM

Apr 15 2020

seirl edited P648 Masterwork From Distant Lands.
Apr 15 2020, 3:57 PM
seirl closed T2361: WARNING:swh.core.cli:Could not load subcommand dataset: No module named 'swh.dataset.cli' as Resolved.
Apr 15 2020, 3:36 PM · Datasets
seirl added a comment to T2361: WARNING:swh.core.cli:Could not load subcommand dataset: No module named 'swh.dataset.cli'.

Temporary fix here until the branch that implements this entrypoint is merged: https://forge.softwareheritage.org/rDDATASETbe9e71ba1f858bbb8f44649306b919a1fa965ea2

Apr 15 2020, 3:36 PM · Datasets
seirl committed rDDATASETbe9e71ba1f85: setup.py: temporarily remove entrypoint until diff is merged (authored by seirl).
setup.py: temporarily remove entrypoint until diff is merged
Apr 15 2020, 3:34 PM

Apr 14 2020

seirl committed rDJNLb53d7d7c37ce: client: move subscription to a separate function (authored by seirl).
client: move subscription to a separate function
Apr 14 2020, 3:29 PM
seirl closed D3013: client: move subscription to a separate function.
Apr 14 2020, 3:29 PM
seirl updated the diff for D3013: client: move subscription to a separate function.

rebase

Apr 14 2020, 3:28 PM
seirl committed rDGRPH5d6b06e03979: cli: add a memcache command (authored by seirl).
cli: add a memcache command
Apr 14 2020, 11:41 AM
seirl closed D3002: cli: add a cachemount command.
Apr 14 2020, 11:41 AM

Apr 10 2020

seirl updated the diff for D3002: cli: add a cachemount command.

Rename command to cachemount, mention default cache path

Apr 10 2020, 7:19 PM
seirl planned changes to D3011: dataset: add graph export based on kafka.
Apr 10 2020, 7:08 PM
seirl updated the diff for D3011: dataset: add graph export based on kafka.
  • Add docstrings and types
Apr 10 2020, 7:07 PM
seirl created D3013: client: move subscription to a separate function.
Apr 10 2020, 5:22 PM
seirl added inline comments to D3011: dataset: add graph export based on kafka.
Apr 10 2020, 4:21 PM
seirl created D3011: dataset: add graph export based on kafka.
Apr 10 2020, 3:45 PM
seirl updated the diff for D3002: cli: add a cachemount command.

Add default cache path

Apr 10 2020, 2:17 PM
seirl created D3002: cli: add a cachemount command.
Apr 10 2020, 12:19 PM
seirl created P647 swh graph memcache.
Apr 10 2020, 11:49 AM
seirl committed rDGRPH5cebcba4b583: java: Graph: load the graph with mmap() (authored by seirl).
java: Graph: load the graph with mmap()
Apr 10 2020, 10:53 AM

Apr 5 2020

seirl created P639 (An Untitled Masterwork).
Apr 5 2020, 2:12 PM

Apr 4 2020

seirl created P638 (An Untitled Masterwork).
Apr 4 2020, 12:20 PM

Mar 23 2020

seirl created P625 (An Untitled Masterwork).
Mar 23 2020, 9:45 PM
seirl committed rDMOD4a2233c5f732: identifiers: encode origin URLs in utf-8 (authored by seirl).
identifiers: encode origin URLs in utf-8
Mar 23 2020, 7:10 PM
seirl committed rDJNL099d8190ab24: replayer: factor out legacy objects fixers (authored by seirl).
replayer: factor out legacy objects fixers
Mar 23 2020, 6:52 PM
seirl closed D2868: replayer: factor out legacy objects fixers.
Mar 23 2020, 6:52 PM
seirl updated the diff for D2868: replayer: factor out legacy objects fixers.

use fallback for revision

Mar 23 2020, 6:45 PM
seirl updated the diff for D2868: replayer: factor out legacy objects fixers.

Fix tests

Mar 23 2020, 6:39 PM
seirl created D2868: replayer: factor out legacy objects fixers.
Mar 23 2020, 6:00 PM

Mar 17 2020

seirl created P617 (An Untitled Masterwork).
Mar 17 2020, 6:01 PM

Mar 10 2020

seirl created P610 (An Untitled Masterwork).
Mar 10 2020, 7:33 PM

Feb 26 2020

seirl committed rDJNL6ca43d5c59e6: JournalClient: add a stop_at_eof boolean to read the log only once (authored by seirl).
JournalClient: add a stop_at_eof boolean to read the log only once
Feb 26 2020, 4:26 PM
seirl closed D2718: JournalClient: add a stop_at_eof boolean to read the log only once.
Feb 26 2020, 4:26 PM
seirl updated the diff for D2718: JournalClient: add a stop_at_eof boolean to read the log only once.

fix reviews

Feb 26 2020, 4:09 PM
seirl added inline comments to D2718: JournalClient: add a stop_at_eof boolean to read the log only once.
Feb 26 2020, 12:31 PM
seirl created D2718: JournalClient: add a stop_at_eof boolean to read the log only once.
Feb 26 2020, 12:29 PM
seirl committed rDJNLeea69820792f: JournalClient: split main loop in three functions (authored by seirl).
JournalClient: split main loop in three functions
Feb 26 2020, 12:28 PM
seirl closed D2651: JournalClient: split main loop in three functions.
Feb 26 2020, 12:28 PM

Feb 25 2020

seirl updated the diff for D2651: JournalClient: split main loop in three functions.

Fix reviews

Feb 25 2020, 4:39 PM

Feb 24 2020

seirl committed rMSLD7ea1791249ec: gsoc-epita: add 2020 slides (authored by haltode).
gsoc-epita: add 2020 slides
Feb 24 2020, 5:59 PM
seirl closed D2711: gsoc-epita: add 2020 slides.
Feb 24 2020, 5:59 PM
seirl accepted D2711: gsoc-epita: add 2020 slides.
Feb 24 2020, 3:09 PM

Feb 14 2020

seirl added inline comments to D2669: Add ?limit=N method variants to return first N results.
Feb 14 2020, 5:19 PM

Feb 11 2020

seirl planned changes to D2629: dataset: add graph export based on kafka.
Feb 11 2020, 5:40 PM
seirl created D2651: JournalClient: split main loop in three functions.
Feb 11 2020, 5:26 PM
seirl created P595 (An Untitled Masterwork).
Feb 11 2020, 4:44 PM

Feb 10 2020

seirl created P594 (An Untitled Masterwork).
Feb 10 2020, 6:03 PM
seirl created P593 (An Untitled Masterwork).
Feb 10 2020, 5:39 PM
seirl created P592 (An Untitled Masterwork).
Feb 10 2020, 4:55 PM

Feb 5 2020

seirl updated the diff for D2629: dataset: add graph export based on kafka.

Remove useless while true

Feb 5 2020, 7:08 PM
seirl updated the diff for D2629: dataset: add graph export based on kafka.

Remove commented out part

Feb 5 2020, 7:04 PM
seirl created D2629: dataset: add graph export based on kafka.
Feb 5 2020, 7:02 PM
seirl committed rDDATASET889d6636379e: Bootstrap Python project files (authored by seirl).
Bootstrap Python project files
Feb 5 2020, 7:02 PM

Dec 23 2019

seirl committed rDGRPHfc241e29ff65: ForkCC: add whitelist/rootdir options (authored by seirl).
ForkCC: add whitelist/rootdir options
Dec 23 2019, 1:14 AM
seirl committed rDGRPH1c461b64be84: java/Traversal: add findCommonDescendant (authored by seirl).
java/Traversal: add findCommonDescendant
Dec 23 2019, 1:14 AM
seirl committed rDGRPHe41ec924ea28: java: add ListEmptyOrigins tool (authored by seirl).
java: add ListEmptyOrigins tool
Dec 23 2019, 1:14 AM

Dec 17 2019

seirl committed R183:c036b803c9fd: Add zhou2019fork (authored by seirl).
Add zhou2019fork
Dec 17 2019, 1:39 AM

Dec 16 2019

seirl committed R183:461db17e680e: add software fork references (authored by seirl).
add software fork references
Dec 16 2019, 10:34 PM

Dec 14 2019

seirl triaged T2153: SWH PIDs as an "Alternate Identifier" in Zenodo as Normal priority.
Dec 14 2019, 9:05 PM · Scientific Community Building
seirl updated the task description for T2153: SWH PIDs as an "Alternate Identifier" in Zenodo.
Dec 14 2019, 4:50 PM · Scientific Community Building
seirl created T2153: SWH PIDs as an "Alternate Identifier" in Zenodo.
Dec 14 2019, 4:50 PM · Scientific Community Building

Dec 6 2019

seirl accepted D2379: CLI: generalize 'map lookup' to lookup many identifiers at once.
Dec 6 2019, 3:44 PM

Dec 4 2019

seirl requested changes to D2379: CLI: generalize 'map lookup' to lookup many identifiers at once.
Dec 4 2019, 2:32 PM