Page MenuHomeSoftware Heritage

douardda (David Douard)
User

User Details

User Since
Jul 10 2018, 12:38 PM (159 w, 3 d)

Recent Activity

Yesterday

douardda added a comment to P1110 bad stream_results_optional.

ok then

return itertools.chain([res], stream_results(f, page_token = res.page_token, **kwargs))
Fri, Jul 30, 3:44 PM
douardda added a comment to P1110 bad stream_results_optional.

why not something like:

Fri, Jul 30, 3:36 PM
douardda triaged T3453: Refactor the backend to make it scale better as High priority.
Fri, Jul 30, 2:21 PM · Provenance database

Wed, Jul 28

douardda updated the diff for D6031: Add a quick start section in the documentation and simplify the configuration file loading mechanism in the cli.

rebase

Wed, Jul 28, 2:44 PM
douardda updated the diff for D6015: Use stored SQL functions for content_find_{all,one}() and merge Provenance*DB classes in a single ProvenanceDB.

rebase

Wed, Jul 28, 2:43 PM
douardda updated the diff for D5843: Add support for a denormalized version of the provenance DB.

rebase

Wed, Jul 28, 2:43 PM
douardda updated the diff for D6031: Add a quick start section in the documentation and simplify the configuration file loading mechanism in the cli.

rebase

Wed, Jul 28, 2:41 PM
douardda updated the diff for D6015: Use stored SQL functions for content_find_{all,one}() and merge Provenance*DB classes in a single ProvenanceDB.

move _relation_uses_location_table at the end of the class

Wed, Jul 28, 2:40 PM
douardda added inline comments to D6015: Use stored SQL functions for content_find_{all,one}() and merge Provenance*DB classes in a single ProvenanceDB.
Wed, Jul 28, 2:28 PM
douardda updated the diff for D6031: Add a quick start section in the documentation and simplify the configuration file loading mechanism in the cli.

fix typos reported by ardumont and vlorentz (thx)

Wed, Jul 28, 2:20 PM
douardda added a comment to D5843: Add support for a denormalized version of the provenance DB.

It's not clear to me how the denormalized version handles the insertion of duplicated entries.

It's something I am still trying to figure also (whether this code performs as expected under heavy concurrent workload). I want to make more tests (by hand, this is hard to implement as a "unit" test) ASAP.

Wed, Jul 28, 1:57 PM

Tue, Jul 27

douardda accepted D5985: Simplify history graph creation and origin-revision algorithm.
Tue, Jul 27, 6:15 PM
douardda requested changes to D6026: Add test for origin-revision layer.

I am not fond at all of the code duplication (between R-C and O-R synth file parsers), looks to me at least parts of it could be kept factorised in a dedicated module (I agree it should not live in conftest any more: too much code and logic now). It would then be best to have these test-helper functions tested themselves (as unitary as possible).

Tue, Jul 27, 6:11 PM
douardda requested review of D6031: Add a quick start section in the documentation and simplify the configuration file loading mechanism in the cli.
Tue, Jul 27, 6:05 PM
douardda added a comment to D5843: Add support for a denormalized version of the provenance DB.

It's not clear to me how the denormalized version handles the insertion of duplicated entries.

Tue, Jul 27, 4:39 PM
douardda added a comment to D5843: Add support for a denormalized version of the provenance DB.

It's not clear to me how the denormalized version handles the insertion of duplicated entries.

Tue, Jul 27, 4:36 PM
douardda updated the diff for D6015: Use stored SQL functions for content_find_{all,one}() and merge Provenance*DB classes in a single ProvenanceDB.

rebase and cpitalize sql queries

Tue, Jul 27, 4:27 PM
douardda updated the diff for D5843: Add support for a denormalized version of the provenance DB.

capitalize sql querie

Tue, Jul 27, 4:26 PM
douardda added inline comments to D6015: Use stored SQL functions for content_find_{all,one}() and merge Provenance*DB classes in a single ProvenanceDB.
Tue, Jul 27, 4:02 PM
douardda accepted D6002: git_bare: Add support for swh-graph when loading a snapshot.

LGTM but see my questions (not sure they make really sense, but who knows)

Tue, Jul 27, 11:28 AM
douardda added a comment to T3444: 26/07/2021: Unstuck infrastructure outage then post-mortem.

ceph is not properly monitored (ENOSPC should not get unoticed on these machines),

P1099 and further earlier logs from that moment do not seem to warn about this... T3945
got created for this.

Tue, Jul 27, 9:52 AM · System administration

Mon, Jul 26

douardda added a comment to T3444: 26/07/2021: Unstuck infrastructure outage then post-mortem.

Potential issues/weakness of our current infra:

Mon, Jul 26, 5:15 PM · System administration

Thu, Jul 22

douardda added a comment to D6015: Use stored SQL functions for content_find_{all,one}() and merge Provenance*DB classes in a single ProvenanceDB.

I would have loved to also replace the logic in relation_add() and _relation_get() by stored SQL functions, but it's above my poor SQL skills...

Thu, Jul 22, 5:47 PM
douardda requested review of D6015: Use stored SQL functions for content_find_{all,one}() and merge Provenance*DB classes in a single ProvenanceDB.
Thu, Jul 22, 3:05 PM

Wed, Jul 21

douardda updated the diff for D5843: Add support for a denormalized version of the provenance DB.

rebase

Wed, Jul 21, 3:09 PM

Mon, Jul 19

douardda added a comment to T3104: Persistent readonly perfect hash table.

sorry I don't understand everything here:

Mon, Jul 19, 5:20 PM · Object storage

Fri, Jul 2

douardda accepted D5943: Fix database queries related to the origin-revision layer.

I still disagree with the implementation of get_dates() but meh

Fri, Jul 2, 4:38 PM
douardda added inline comments to D5943: Fix database queries related to the origin-revision layer.
Fri, Jul 2, 4:36 PM
douardda added inline comments to D5943: Fix database queries related to the origin-revision layer.
Fri, Jul 2, 4:35 PM
douardda accepted D5947: Add `ProvenanceStorageInterface` as discussed during backend design.

I've made several small comments / nitpicks, fell free to address them or not.

Fri, Jul 2, 4:32 PM
douardda added inline comments to D5947: Add `ProvenanceStorageInterface` as discussed during backend design.
Fri, Jul 2, 4:30 PM
douardda accepted D5946: Rework `ProvenanceInterface` as discussed during backend design.

okay but as stated, I don't like too much the general usage of the RealDictCursor; sometimes it helps, but sometimes it does not. Ideally both should be available (depending on the query).

Fri, Jul 2, 3:48 PM
douardda requested changes to D5943: Fix database queries related to the origin-revision layer.
Fri, Jul 2, 3:40 PM
douardda accepted D5925: Refactor ArchiveInterface to fit origin-revision layer needs.
Fri, Jul 2, 3:33 PM

Thu, Jul 1

douardda added inline comments to D5943: Fix database queries related to the origin-revision layer.
Thu, Jul 1, 3:28 PM
douardda added inline comments to D5925: Refactor ArchiveInterface to fit origin-revision layer needs.
Thu, Jul 1, 3:25 PM
douardda accepted D5944: Add tests for history graph topology.

ok but please remove print statements before

Thu, Jul 1, 12:37 PM
douardda added inline comments to D5944: Add tests for history graph topology.
Thu, Jul 1, 12:33 PM
douardda updated subscribers of D5943: Fix database queries related to the origin-revision layer.
Thu, Jul 1, 12:29 PM
douardda requested changes to D5925: Refactor ArchiveInterface to fit origin-revision layer needs.
Thu, Jul 1, 12:07 PM

Jul 1 2021

douardda added a comment to D5943: Fix database queries related to the origin-revision layer.

Why do all these queries use LOCK TABLE?

Jul 1 2021, 10:53 AM
douardda accepted D5948: Force `snapshot_get_heads` to return revisions in chronological order.

ok but the SQL query could be improved to not return unwanted dates

Jul 1 2021, 10:49 AM

Jun 29 2021

douardda triaged T3416: Implement the replayer service for Vitam as High priority.
Jun 29 2021, 9:33 AM
douardda added a comment to T3415: Specify the Vitam archiving format.

This initial proposal from CINES has not been selected because it de facto normalize a number of relations of the SWH graph making it unfit to storage in a solution like Vitam (too many objects, hard to manage incremental updates).

Jun 29 2021, 9:30 AM
douardda added a comment to T3415: Specify the Vitam archiving format.
  1. Proposal from CINES
Jun 29 2021, 9:27 AM
douardda triaged T3415: Specify the Vitam archiving format as High priority.
Jun 29 2021, 9:27 AM
douardda triaged T3414: Save the Archive in CINES' Vitam platform as High priority.
Jun 29 2021, 9:22 AM

Jun 28 2021

douardda added a comment to D5843: Add support for a denormalized version of the provenance DB.

Should this be documented somewhere? (How to use it / why)

Jun 28 2021, 3:35 PM

Jun 25 2021

douardda created P1078 (An Untitled Masterwork).
Jun 25 2021, 3:33 PM
douardda accepted D5893: hypothesis_strategies: Add raw_extrinsic_metadata() strategy.
Jun 25 2021, 11:27 AM
douardda accepted D5914: backend: Auto-generate origin visit stats upsert query.
Jun 25 2021, 11:25 AM
douardda accepted D5916: cli/task: Ensure cli output is always in the same order.
Jun 25 2021, 11:23 AM
douardda requested changes to D5917: journal_client: Only check last_* fields for some permutation tests.
Jun 25 2021, 11:22 AM
douardda added a comment to D5917: journal_client: Only check last_* fields for some permutation tests.

I think I'd rather like to have an explicit list of excluded fields (when these extra fields are added). So I'd prefer see this diff be something that compares dicts (as a result of BaseObject.to_dict()), possibly filtered to exclude some fields.

Jun 25 2021, 11:21 AM

Jun 23 2021

douardda added a comment to D5843: Add support for a denormalized version of the provenance DB.

Also, at some point we might want to use better templating to write these SQL queries, or use stored procedures (with the proper "variation" being chosen at db creation time on the selected flavor; would simplify the python code a lot.

Jun 23 2021, 11:22 AM
douardda updated the diff for D5843: Add support for a denormalized version of the provenance DB.

reword a bit the ci message and kill a few tabs in 30-schema.sql

Jun 23 2021, 11:12 AM
douardda added a comment to D5843: Add support for a denormalized version of the provenance DB.

yes I know, names for the subqueries are horrible...

Jun 23 2021, 11:07 AM
douardda added a comment to D5843: Add support for a denormalized version of the provenance DB.

yes I know, names for the subqueries are horrible...

Jun 23 2021, 11:07 AM
douardda retitled D5843: Add support for a denormalized version of the provenance DB from [WIP] Add support for a denormalized version of the provenance DB to Add support for a denormalized version of the provenance DB.
Jun 23 2021, 11:04 AM
douardda updated the diff for D5843: Add support for a denormalized version of the provenance DB.

rebase, adapt and implement denormalization for content_in_dir and dir_in_rev

Jun 23 2021, 11:03 AM

Jun 22 2021

douardda abandoned D5841: Remove the without-path flavor of ProvenanceDB.

we keep it for now

Jun 22 2021, 5:12 PM
douardda abandoned D5885: Add support for (topological) branches and merges in generate_repo.py.

I believe this diff is duplicated and the other one was already landed.

Jun 22 2021, 5:11 PM
douardda accepted D5902: Remove origin_get_id method from ProvenanceInterface.

overall ok but see the comment

Jun 22 2021, 11:05 AM

Jun 21 2021

douardda closed D5894: Allow to add extra origins and snapshots in generated test storages.
Jun 21 2021, 4:48 PM
douardda closed D5892: Add support for (topological) branches and merges in generate_repo.py.
Jun 21 2021, 4:48 PM
douardda committed rDPROV011645221cf6: Allow to add extra origins and snapshots in generated test storages (authored by douardda).
Allow to add extra origins and snapshots in generated test storages
Jun 21 2021, 4:48 PM
douardda committed rDPROV6734fd36b872: Add support for (topological) branches and merges in generate_repo.py (authored by douardda).
Add support for (topological) branches and merges in generate_repo.py
Jun 21 2021, 4:48 PM
douardda closed D5891: Refactor the generate_storage_from_git dataset creation tool.
Jun 21 2021, 4:48 PM
douardda committed rDPROV7886bf494ab8: Refactor the generate_storage_from_git dataset creation tool (authored by douardda).
Refactor the generate_storage_from_git dataset creation tool
Jun 21 2021, 4:48 PM
douardda updated the diff for D5891: Refactor the generate_storage_from_git dataset creation tool.

rebase

Jun 21 2021, 4:46 PM
douardda updated the diff for D5892: Add support for (topological) branches and merges in generate_repo.py.

rebase

Jun 21 2021, 4:45 PM
douardda updated the diff for D5894: Allow to add extra origins and snapshots in generated test storages.

typos

Jun 21 2021, 4:43 PM
douardda added inline comments to D5894: Allow to add extra origins and snapshots in generated test storages.
Jun 21 2021, 4:39 PM
douardda accepted D5886: Refactor origin-revision layer.

ok but some questions/remarks have not been addressed...

Jun 21 2021, 4:37 PM
douardda accepted D5880: Update methods associated to the origin-revision layer.

Thanks for the DatetimeCache & co.

Jun 21 2021, 4:32 PM
douardda added a comment to T3382: Save process seems to be stuck.

I agree having access to the logs of the task (more or less) in real-time would be very handy (as one can expect on any CI-like tool nowadays).

Jun 21 2021, 10:50 AM · Save Code Now
douardda added inline comments to D5886: Refactor origin-revision layer.
Jun 21 2021, 10:20 AM

Jun 18 2021

douardda added inline comments to D5880: Update methods associated to the origin-revision layer.
Jun 18 2021, 5:36 PM
douardda added inline comments to D5880: Update methods associated to the origin-revision layer.
Jun 18 2021, 2:49 PM
douardda accepted D5862: Rework ArchiveInterface.

but please add a comment in the ArchivePostgreSQL's version of snapshot_get_heads explaining why it's (for now) a duplication of the other implementation, thanks

Jun 18 2021, 2:34 PM
douardda added a comment to D5862: Rework ArchiveInterface.

OK for the first two items, but I don't agree on the third one. The idea is to replace one of them by a direct SQL query in the near future so reworking this will be useless. I just didn't implement the query because I needed to move forward with the other stuff

Jun 18 2021, 2:31 PM
douardda accepted D5884: Fix bugs when retrieving parents in RevisionEntry.

thanks, the diff looks much simpler now :-)

Jun 18 2021, 2:26 PM
douardda added inline comments to D5884: Fix bugs when retrieving parents in RevisionEntry.
Jun 18 2021, 2:22 PM
douardda requested changes to D5886: Refactor origin-revision layer.

I know I am rambling, but could it come with some testing?

Jun 18 2021, 2:20 PM
douardda requested review of D5892: Add support for (topological) branches and merges in generate_repo.py.
Jun 18 2021, 12:35 PM
douardda updated the diff for D5894: Allow to add extra origins and snapshots in generated test storages.

use 'branches' instead of "revisions" as section in the yaml file

Jun 18 2021, 12:35 PM
douardda requested review of D5891: Refactor the generate_storage_from_git dataset creation tool.
Jun 18 2021, 12:32 PM
douardda requested review of D5894: Allow to add extra origins and snapshots in generated test storages.
Jun 18 2021, 12:29 PM
douardda committed rDJNLa06bab98b115: Add a StreamJournalWriter backend (authored by douardda).
Add a StreamJournalWriter backend
Jun 18 2021, 11:21 AM
douardda closed D5890: Add a StreamJournalWriter backend.
Jun 18 2021, 11:21 AM
douardda committed rDJNLa4ae96d12d2c: Better annotation for InMemoryJournalWriter's value_sanitizer (authored by douardda).
Better annotation for InMemoryJournalWriter's value_sanitizer
Jun 18 2021, 11:21 AM
douardda requested changes to D5880: Update methods associated to the origin-revision layer.

Mostly nitpicking comments, but I'd really prefer that:

  • the cache is kept properly typed
  • the cache clearing thing gets its own git revision
Jun 18 2021, 11:06 AM
douardda added inline comments to D5890: Add a StreamJournalWriter backend.
Jun 18 2021, 10:49 AM
douardda updated the diff for D5890: Add a StreamJournalWriter backend.

Add a revision to fix the annotation of InMemory's value_sanitizer

Jun 18 2021, 10:48 AM
douardda added inline comments to D5890: Add a StreamJournalWriter backend.
Jun 18 2021, 10:37 AM
douardda updated the diff for D5890: Add a StreamJournalWriter backend.

small simplification in the docstring

Jun 18 2021, 10:34 AM
douardda updated the diff for D5890: Add a StreamJournalWriter backend.

use get_journal_writer in test_stream

Jun 18 2021, 10:31 AM
douardda updated the diff for D5890: Add a StreamJournalWriter backend.

fix docstring, thx ardumont

Jun 18 2021, 10:25 AM
douardda added inline comments to D5890: Add a StreamJournalWriter backend.
Jun 18 2021, 10:23 AM
douardda added inline comments to D5890: Add a StreamJournalWriter backend.
Jun 18 2021, 10:17 AM