Page MenuHomeSoftware Heritage

seirl (Antoine Pietri)
User

User Details

User Since
Feb 2 2017, 11:38 AM (219 w, 2 d)

Recent Activity

Fri, Apr 16

seirl committed rDDATASETdffb127c5a3b: athena: pass database name as an attribute (authored by seirl).
athena: pass database name as an attribute
Fri, Apr 16, 7:43 PM
seirl closed D5540: docs: Update for new schema.
Fri, Apr 16, 7:39 PM
seirl committed rDDATASET01ba5d14ef21: docs: Update for new schema (authored by seirl).
docs: Update for new schema
Fri, Apr 16, 7:39 PM
seirl added a comment to D5501: add an anti-Dos limit for edges traversed as a query parameter.

It looks to me like this would be simpler if max_edges was given as a parameter to Traversal, since it's common to most methods. Would that work?

Fri, Apr 16, 1:09 PM

Thu, Apr 15

seirl added a reviewer for D5540: docs: Update for new schema: zack.
Thu, Apr 15, 5:55 PM
seirl requested review of D5540: docs: Update for new schema.
Thu, Apr 15, 5:55 PM
seirl closed D5527: swh_model_data: add parents to test revision.
Thu, Apr 15, 4:37 PM
seirl committed rDMOD1f6b3b9d5b41: swh_model_data: add parents to test revision (authored by seirl).
swh_model_data: add parents to test revision
Thu, Apr 15, 4:37 PM
seirl updated the diff for D5527: swh_model_data: add parents to test revision.

rebase

Thu, Apr 15, 4:37 PM

Wed, Apr 14

seirl committed rDDATASET4636d1c146aa: Add two ORC tools (orc-merge, orc-print-contents) (authored by seirl).
Add two ORC tools (orc-merge, orc-print-contents)
Wed, Apr 14, 7:07 PM
seirl committed rDDATASET33a50eac62f3: journalprocessor: disable in-partition sharding for LevelDB tests (authored by seirl).
journalprocessor: disable in-partition sharding for LevelDB tests
Wed, Apr 14, 6:56 PM
seirl committed rDDATASETab6191bc712f: journalprocessor: only reassign partitions when needed (authored by seirl).
journalprocessor: only reassign partitions when needed
Wed, Apr 14, 6:56 PM
seirl committed rDDATASETf5526f05d314: athena: add documentation and licensing info (authored by seirl).
athena: add documentation and licensing info
Wed, Apr 14, 6:48 PM
seirl committed rDDATASETcfb3bc5510d4: ORC: export missing revision_history table (authored by seirl).
ORC: export missing revision_history table
Wed, Apr 14, 6:48 PM
seirl closed D5522: Add athena subcommand to create/query AWS Athena database.
Wed, Apr 14, 6:48 PM
seirl committed rDDATASETb1d76ed7a763: Add athena subcommand to create/query AWS Athena database (authored by seirl).
Add athena subcommand to create/query AWS Athena database
Wed, Apr 14, 6:48 PM
seirl committed rDDATASET5459673218d1: Move ORC table schema in relational.py (authored by seirl).
Move ORC table schema in relational.py
Wed, Apr 14, 6:48 PM
seirl requested review of D5527: swh_model_data: add parents to test revision.
Wed, Apr 14, 6:40 PM
seirl updated the diff for D5522: Add athena subcommand to create/query AWS Athena database.

Add documentation and licensing info

Wed, Apr 14, 4:48 PM
seirl added a comment to D5522: Add athena subcommand to create/query AWS Athena database.

Thanks for the review!

Wed, Apr 14, 4:38 PM
seirl added a comment to T2981: Graph API: add a (node type) result filters.

I just want to write something here that maybe isn't clear from the initial task description. This filtering must happen *after* the visit, not before. We can already change *how* the graph is visited using the edges parameter, the goal of this task is to filter the result post-visit.

Wed, Apr 14, 4:28 PM · Graph service
seirl added a comment to T1968: existing graph endpoints should not return 404 upon missing arguments.

Right, I suppose we can close the task then?

Wed, Apr 14, 4:25 PM · Easy hack, Graph service
seirl updated the diff for D5522: Add athena subcommand to create/query AWS Athena database.

Remove debug print

Wed, Apr 14, 2:01 PM
seirl added a reviewer for D5522: Add athena subcommand to create/query AWS Athena database: Reviewers.
Wed, Apr 14, 1:58 PM
seirl updated the diff for D5522: Add athena subcommand to create/query AWS Athena database.

Rebase

Wed, Apr 14, 1:57 PM
seirl retitled D5522: Add athena subcommand to create/query AWS Athena database from Move ORC table schema in relational.py to Add athena subcommand to create/query AWS Athena database.
Wed, Apr 14, 1:57 PM
seirl requested review of D5522: Add athena subcommand to create/query AWS Athena database.
Wed, Apr 14, 1:55 PM
seirl committed rDDATASET11b2436563e8: test_edges: fix mypy error while mocking a method (authored by seirl).
test_edges: fix mypy error while mocking a method
Wed, Apr 14, 1:54 PM

Tue, Apr 13

seirl updated subscribers of T1968: existing graph endpoints should not return 404 upon missing arguments.

@zack We talked about this on IRC with @vlorentz, I think this issue is invalid. We chose to have the source and destination nodes as part of the URI in the API. Semantically, it makes sense that accessing the path without these path fragments would return a 404: it's not a missing argument but an invalid path. If we had a ?src= and a &dst= arguments instead, then having a 400 error would make sense, but in our case the semantics are really weird.

Tue, Apr 13, 7:05 PM · Easy hack, Graph service

Fri, Apr 9

seirl committed rDGRPH4f751998c69c: NodeIdMap: add backward compatibility for loading MPH on strings (authored by seirl).
NodeIdMap: add backward compatibility for loading MPH on strings
Fri, Apr 9, 4:19 PM
seirl closed D5427: NodeIdMap: use the MPH + mmapped .order to translate SWHID -> node ID.
Fri, Apr 9, 4:19 PM
seirl committed rDGRPH53bbd5c65cbe: NodeIdMap: use the MPH + mmapped .order to translate SWHID -> node ID (authored by seirl).
NodeIdMap: use the MPH + mmapped .order to translate SWHID -> node ID
Fri, Apr 9, 4:19 PM
seirl updated the diff for D5427: NodeIdMap: use the MPH + mmapped .order to translate SWHID -> node ID.
  • Fix reviews
  • Add backward compatibility for loading MPH on strings
Fri, Apr 9, 3:49 PM

Wed, Apr 7

seirl closed T3178: document how to export the graph dataset automatically, a subtask of T1847: fully automate export of the graph dataset, as Invalid.
Wed, Apr 7, 3:03 PM · Graph service, Datasets
seirl closed T3178: document how to export the graph dataset automatically as Invalid.

Duplicate of T2431

Wed, Apr 7, 3:03 PM · Documentation, Datasets
seirl added a subtask for T1847: fully automate export of the graph dataset: T2431: Document how to export the graph edge dataset.
Wed, Apr 7, 3:03 PM · Graph service, Datasets
seirl added a parent task for T2431: Document how to export the graph edge dataset: T1847: fully automate export of the graph dataset.
Wed, Apr 7, 3:03 PM · Documentation, Graph service, Datasets
seirl added a comment to D5427: NodeIdMap: use the MPH + mmapped .order to translate SWHID -> node ID.

The new way of doing things is a lot more natural thing to do since we already have the MPH and the .order file

Newcomers aren't aware of this. I had no idea we had those before reading this diff.

Wed, Apr 7, 3:02 PM
seirl added a comment to D5427: NodeIdMap: use the MPH + mmapped .order to translate SWHID -> node ID.

I'm not saying the current state of the docs is good enough, I'm saying this commit message doesn't explain the design but why we're moving away from the old binary search solution. The new way of doing things is a lot more natural thing to do since we already have the MPH and the .order file, so there's no need to document why the old solution was bad in the main docs.

Wed, Apr 7, 12:04 PM
seirl added a comment to D5427: NodeIdMap: use the MPH + mmapped .order to translate SWHID -> node ID.

Where are the .order and MPH computed?

Wed, Apr 7, 10:39 AM
seirl added a comment to D5427: NodeIdMap: use the MPH + mmapped .order to translate SWHID -> node ID.

Thanks for the review. I don't think this needs to be documented elsewhere, it just describes why we're doing the change. What should be documented instead is why we're using these data structures in the first place. Right now this is done sparsely in the different source files, and this commit updates the already existing documentation.

Wed, Apr 7, 10:38 AM

Tue, Apr 6

seirl committed rDGRPH15c2da0f084f: java: fix formatting (authored by seirl).
java: fix formatting
Tue, Apr 6, 9:09 PM
seirl requested review of D5427: NodeIdMap: use the MPH + mmapped .order to translate SWHID -> node ID.
Tue, Apr 6, 3:46 PM
seirl added a comment to D5411: return a 400 error when accessing endpoints without the arguments.

There's a problem with this diff, it's on an old java-only backend that isn't the one we use when we run swh graph rpc-serve. The one that is currently used is in python, at swh/graph/server/app.py

Tue, Apr 6, 12:35 PM

Fri, Apr 2

seirl committed rDGRPHf055c4eaf016: Recompress test graph with byte array MPH (authored by seirl).
Recompress test graph with byte array MPH
Fri, Apr 2, 3:59 PM
seirl committed rDGRPH7eef7cb3f94b: Compress graph with byte arrays instead of strings (authored by seirl).
Compress graph with byte arrays instead of strings
Fri, Apr 2, 3:59 PM

Fri, Mar 26

seirl closed D5315: Add LevelDB backend for exporter node sets.
Fri, Mar 26, 2:29 PM
seirl committed rDDATASETe16e9c5bb271: Add LevelDB backend for exporter node sets (authored by seirl).
Add LevelDB backend for exporter node sets
Fri, Mar 26, 2:29 PM
seirl committed rDDATASETf43ce97371ba: ORC exporter: handle releases with empty authors/dates (authored by seirl).
ORC exporter: handle releases with empty authors/dates
Fri, Mar 26, 2:29 PM
seirl updated the diff for D5315: Add LevelDB backend for exporter node sets.

Rebase + fix phabricator incorrect ID

Fri, Mar 26, 2:27 PM
seirl reopened D5315: Add LevelDB backend for exporter node sets.
Fri, Mar 26, 2:27 PM
seirl closed D5316: Model test data: add Release with no author/date.

Merged in https://forge.softwareheritage.org/rDMOD9523be0552d822be617da77bf0d2ca2f479da572

Fri, Mar 26, 12:10 PM
seirl updated the diff for D5316: Model test data: add Release with no author/date.

kSJFGSLHDFSKJGHDKFJHGDKFJHG

Fri, Mar 26, 12:09 PM
seirl closed D5315: Add LevelDB backend for exporter node sets.
Fri, Mar 26, 12:07 PM
seirl committed rDMOD9523be0552d8: Model test data: add Release with no author/date (authored by seirl).
Model test data: add Release with no author/date
Fri, Mar 26, 12:07 PM
seirl updated the diff for D5315: Add LevelDB backend for exporter node sets.

Remove phabricator garbage

Fri, Mar 26, 12:07 PM
seirl closed T1847: fully automate export of the graph dataset as Resolved.

The ORC exporter is done, and it's likely that we won't provide CSV exports in the future, or we'll generate them from the ORC format.

Fri, Mar 26, 12:04 PM · Graph service, Datasets
seirl closed T1847: fully automate export of the graph dataset, a subtask of T1848: refresh graph dataset export, as Resolved.
Fri, Mar 26, 12:04 PM · Datasets

Thu, Mar 25

seirl placed T3167: Add a --version option to all the CLI commands up for grabs.
Thu, Mar 25, 2:05 PM · Easy hack

Wed, Mar 24

seirl updated the task description for T3170: Revisions in the journal with out of range dates.
Wed, Mar 24, 6:56 PM · Data Model, Journal
seirl updated the task description for T3170: Revisions in the journal with out of range dates.
Wed, Mar 24, 4:11 PM · Data Model, Journal
seirl updated the task description for T3170: Revisions in the journal with out of range dates.
Wed, Mar 24, 4:11 PM · Data Model, Journal
seirl updated the task description for T3170: Revisions in the journal with out of range dates.
Wed, Mar 24, 4:10 PM · Data Model, Journal
seirl triaged T3170: Revisions in the journal with out of range dates as Normal priority.
Wed, Mar 24, 1:13 PM · Data Model, Journal
seirl created P984 (An Untitled Masterwork).
Wed, Mar 24, 11:04 AM
seirl requested review of D5316: Model test data: add Release with no author/date.
Wed, Mar 24, 12:46 AM

Tue, Mar 23

seirl updated the summary of D5315: Add LevelDB backend for exporter node sets.
Tue, Mar 23, 10:13 PM
seirl requested review of D5315: Add LevelDB backend for exporter node sets.
Tue, Mar 23, 10:13 PM
seirl committed rDGRPH92f810a36bc7: Add permissions on edge labels (authored by haltode).
Add permissions on edge labels
Tue, Mar 23, 6:15 PM
seirl closed D4006: WIP: add permissions on edge labels.
Tue, Mar 23, 6:15 PM
seirl committed rDGRPH6592ab3fb067: DirEntry: allow for empty permission field (authored by seirl).
DirEntry: allow for empty permission field
Tue, Mar 23, 6:15 PM
seirl committed rDGRPHe0be35f0f59e: labels: use -label prefix for all edge labels, instead of -filename-labels (authored by seirl).
labels: use -label prefix for all edge labels, instead of -filename-labels
Tue, Mar 23, 6:15 PM
seirl committed rDGRPH5a3d60748fd1: ReadLabelledGraph: use FCL instead of PFCL (authored by seirl).
ReadLabelledGraph: use FCL instead of PFCL
Tue, Mar 23, 6:15 PM
seirl committed rDGRPH188608b87753: java: add subdataset exporting functions (authored by seirl).
java: add subdataset exporting functions
Tue, Mar 23, 6:15 PM
seirl committed rDGRPH9a20f2e9bc2c: LabelMapBuilder: use low-level scanning of the input file (authored by seirl).
LabelMapBuilder: use low-level scanning of the input file
Tue, Mar 23, 6:15 PM
seirl committed rDGRPH278517865425: LabelMapBuilder: restructure in functions (authored by seirl).
LabelMapBuilder: restructure in functions
Tue, Mar 23, 6:15 PM
seirl committed rDGRPH7b31937a4715: LabelMapBuilder: non-static builder function (authored by seirl).
LabelMapBuilder: non-static builder function
Tue, Mar 23, 6:15 PM
seirl committed rDGRPH2fcd96d7bb21: LabelMapBuilder: remove need for hashtable, sync streams (authored by seirl).
LabelMapBuilder: remove need for hashtable, sync streams
Tue, Mar 23, 6:15 PM
seirl committed rDGRPH19f7da78aa54: Use MPH functions operating on byte arrays (authored by seirl).
Use MPH functions operating on byte arrays
Tue, Mar 23, 6:15 PM
seirl committed rDGRPH4e2fedc3bce8: LabelMapBuilder: refactor logic in separate line iterators (authored by seirl).
LabelMapBuilder: refactor logic in separate line iterators
Tue, Mar 23, 6:15 PM
seirl committed rDGRPH0aa061682e95: LabelMapBuilder: support both sorting methods (authored by seirl).
LabelMapBuilder: support both sorting methods
Tue, Mar 23, 6:15 PM
seirl committed rDGRPH968f9c6c2d0e: LabelMapBuilder: add TextualEdgeLabelLineIterator, fix BSort (authored by seirl).
LabelMapBuilder: add TextualEdgeLabelLineIterator, fix BSort
Tue, Mar 23, 6:15 PM
seirl committed rDGRPH469d75616934: Merge branch 'label_permissions' (authored by seirl).
Merge branch 'label_permissions'
Tue, Mar 23, 6:15 PM
seirl assigned T3168: Proper deployment of swh-graph with debian package to olasd.
Tue, Mar 23, 12:24 PM · Graph service, Puppet recipes
seirl placed T3167: Add a --version option to all the CLI commands up for grabs.
Tue, Mar 23, 12:24 PM · Easy hack
seirl assigned T3167: Add a --version option to all the CLI commands to olasd.
Tue, Mar 23, 12:23 PM · Easy hack
seirl triaged T3168: Proper deployment of swh-graph with debian package as High priority.
Tue, Mar 23, 12:19 PM · Graph service, Puppet recipes
seirl updated the task description for T3167: Add a --version option to all the CLI commands.
Tue, Mar 23, 12:18 PM · Easy hack
seirl triaged T3167: Add a --version option to all the CLI commands as Low priority.
Tue, Mar 23, 12:16 PM · Easy hack
seirl created T3167: Add a --version option to all the CLI commands.
Tue, Mar 23, 12:16 PM · Easy hack

Mar 4 2021

seirl created P968 swhgraph.sh.
Mar 4 2021, 12:36 PM

Feb 24 2021

seirl committed rDGRPH9f8c6de06556: Add FindEarliestRevision tool (authored by seirl).
Add FindEarliestRevision tool
Feb 24 2021, 3:29 PM
seirl created P963 FindEarliestRevision.
Feb 24 2021, 3:03 PM

Feb 15 2021

seirl committed rDDATASETcf125983309e: Add ORC exporter (authored by seirl).
Add ORC exporter
Feb 15 2021, 5:45 PM
seirl committed rDDATASET35253c89a722: ORC exporter: Add unit tests (authored by seirl).
ORC exporter: Add unit tests
Feb 15 2021, 5:45 PM
seirl committed rDDATASETbf8d2625d3b3: Refactor export paths in the base Exporter class (authored by seirl).
Refactor export paths in the base Exporter class
Feb 15 2021, 5:45 PM
seirl closed D4762: Add ORC exporter.
Feb 15 2021, 5:45 PM
seirl committed rDDATASET40f068d648d2: ORC exporter: avoid fromtimestamp(), use datetime() from epoch instead (authored by seirl).
ORC exporter: avoid fromtimestamp(), use datetime() from epoch instead
Feb 15 2021, 5:45 PM

Feb 12 2021

seirl requested review of D4762: Add ORC exporter.

I added unit tests and reworked the logic, and also addressed @olasd 's comment. Could you please rereview? :-)

Feb 12 2021, 10:05 PM
seirl updated the diff for D4762: Add ORC exporter.

typo

Feb 12 2021, 10:04 PM