Page MenuHomeSoftware Heritage
Feed Advanced Search

Aug 11 2021

douardda added a comment to D6071: Revisited history graph implementation.

A few remarks:

Aug 11 2021, 12:37 PM

Aug 10 2021

douardda updated the task description for T3085: Complete and updated copy of the archive on S3 (objects+graph).
Aug 10 2021, 4:00 PM · Roadmap 2022, meta-task, Roadmap 2021, System administration, Object storage
douardda added a parent task for T1954: Up-to-date objstorage mirror on S3: T3477: Add alerting when the copy to S3 starts lagging.
Aug 10 2021, 3:59 PM · System administration, Object storage
douardda added a subtask for T3477: Add alerting when the copy to S3 starts lagging: T1954: Up-to-date objstorage mirror on S3.
Aug 10 2021, 3:59 PM · Roadmap 2021, System administration
douardda triaged T3477: Add alerting when the copy to S3 starts lagging as High priority.
Aug 10 2021, 3:58 PM · Roadmap 2021, System administration
douardda updated the task description for T3085: Complete and updated copy of the archive on S3 (objects+graph).
Aug 10 2021, 3:56 PM · Roadmap 2022, meta-task, Roadmap 2021, System administration, Object storage
douardda added a comment to T1954: Up-to-date objstorage mirror on S3.

well this task should be closed, and a new subtask could be added for the alerting

Aug 10 2021, 3:55 PM · System administration, Object storage
douardda added a comment to T1954: Up-to-date objstorage mirror on S3.

unless I'm mistaken, this task can be closed now, it looks to have reached a steady state where the lag is near 0

Aug 10 2021, 2:18 PM · System administration, Object storage

Aug 9 2021

douardda accepted D6067: cassandra: Fix crash when using _missing() functions with more than 100 ids with ScyllaDB..
Aug 9 2021, 11:33 AM
douardda accepted D6069: from_disk: Do not drop tags with missing tagger or date.
Aug 9 2021, 11:32 AM

Aug 6 2021

douardda created P1116 (An Untitled Masterwork).
Aug 6 2021, 3:21 PM
douardda added a comment to T3453: Refactor the backend to make it scale better.

I've been thinking a bit about the refactoring of the ProvenanceStorageServer as described in the doc, with a series of queues between the public API and the backend database.

Aug 6 2021, 11:08 AM · Provenance database
douardda updated subscribers of T3453: Refactor the backend to make it scale better.
Aug 6 2021, 11:04 AM · Provenance database
douardda accepted D6054: Add test for the different `ProvenanceStorageInterface` implementations.
Aug 6 2021, 10:59 AM
douardda closed D6031: Add a quick start section in the documentation and simplify the configuration file loading mechanism in the cli.
Aug 6 2021, 10:58 AM
douardda committed rDPROV058ed19b0100: Simplify the configuration file loading mechanism in the cli (authored by douardda).
Simplify the configuration file loading mechanism in the cli
Aug 6 2021, 10:58 AM
douardda committed rDPROV3b145f15c2db: Add a quick start section in the documentation (authored by douardda).
Add a quick start section in the documentation
Aug 6 2021, 10:58 AM
douardda closed D6015: Use stored SQL functions for content_find_{all,one}() and merge Provenance*DB classes in a single ProvenanceDB.
Aug 6 2021, 10:58 AM
douardda committed rDPROVfbc5499eb0e2: Use stored SQL functions for content_find_{all,one}() (authored by douardda).
Use stored SQL functions for content_find_{all,one}()
Aug 6 2021, 10:58 AM
douardda committed rDPROVf5e6c283b08e: Merge Provenance*DB classes in a single ProvenanceDB (authored by douardda).
Merge Provenance*DB classes in a single ProvenanceDB
Aug 6 2021, 10:58 AM
douardda closed D5843: Add support for a denormalized version of the provenance DB.
Aug 6 2021, 10:58 AM
douardda committed rDPROV1c3d6426ebd2: Add support for a denormalized version of the provenance DB (authored by douardda).
Add support for a denormalized version of the provenance DB
Aug 6 2021, 10:58 AM
douardda updated the diff for D6031: Add a quick start section in the documentation and simplify the configuration file loading mechanism in the cli.

typos

Aug 6 2021, 10:55 AM

Aug 5 2021

douardda added a comment to D6046: elasticsearch.py: Integrate query langauge translator.

there is a typo in the commit message

Aug 5 2021, 1:34 PM
douardda accepted D6051: changelog: Reference first completion of sourceforge hg origins.
Aug 5 2021, 1:33 PM
douardda requested changes to D6054: Add test for the different `ProvenanceStorageInterface` implementations.

overall ok, but I'd like to see the comments about fixtures addressed first.

Aug 5 2021, 12:27 PM
douardda accepted D6053: Refactor the use of archive `Storage` object for testing.

nice job, thx

Aug 5 2021, 12:11 PM
douardda accepted D6026: Add test for origin-revision layer.
Aug 5 2021, 12:08 PM
douardda added a comment to D5843: Add support for a denormalized version of the provenance DB.

I agree some more tests and validations needs to be done on this storage schema, but can we please land it for now as is? I've put a warning in the documentation (in D6031) to point the fact this flavor is not "production ready". cc @aeviso

Aug 5 2021, 12:06 PM

Aug 2 2021

douardda added a comment to T3444: 26/07/2021: Unstuck infrastructure outage then post-mortem.

FTR I've tried to investigate a bit to find clues of what the origin of the outage was, but I did not find any obvious culprit.

Aug 2 2021, 10:06 AM · System administration

Jul 30 2021

douardda added a comment to P1110 bad stream_results_optional.

ok then

return itertools.chain([res], stream_results(f, page_token = res.page_token, **kwargs))
Jul 30 2021, 3:44 PM
douardda added a comment to P1110 bad stream_results_optional.

why not something like:

Jul 30 2021, 3:36 PM
douardda triaged T3453: Refactor the backend to make it scale better as High priority.
Jul 30 2021, 2:21 PM · Provenance database

Jul 28 2021

douardda updated the diff for D6031: Add a quick start section in the documentation and simplify the configuration file loading mechanism in the cli.

rebase

Jul 28 2021, 2:44 PM
douardda updated the diff for D6015: Use stored SQL functions for content_find_{all,one}() and merge Provenance*DB classes in a single ProvenanceDB.

rebase

Jul 28 2021, 2:43 PM
douardda updated the diff for D5843: Add support for a denormalized version of the provenance DB.

rebase

Jul 28 2021, 2:43 PM
douardda updated the diff for D6031: Add a quick start section in the documentation and simplify the configuration file loading mechanism in the cli.

rebase

Jul 28 2021, 2:41 PM
douardda updated the diff for D6015: Use stored SQL functions for content_find_{all,one}() and merge Provenance*DB classes in a single ProvenanceDB.

move _relation_uses_location_table at the end of the class

Jul 28 2021, 2:40 PM
douardda added inline comments to D6015: Use stored SQL functions for content_find_{all,one}() and merge Provenance*DB classes in a single ProvenanceDB.
Jul 28 2021, 2:28 PM
douardda updated the diff for D6031: Add a quick start section in the documentation and simplify the configuration file loading mechanism in the cli.

fix typos reported by ardumont and vlorentz (thx)

Jul 28 2021, 2:20 PM
douardda added a comment to D5843: Add support for a denormalized version of the provenance DB.

It's not clear to me how the denormalized version handles the insertion of duplicated entries.

It's something I am still trying to figure also (whether this code performs as expected under heavy concurrent workload). I want to make more tests (by hand, this is hard to implement as a "unit" test) ASAP.

Jul 28 2021, 1:57 PM

Jul 27 2021

douardda accepted D5985: Simplify history graph creation and origin-revision algorithm.
Jul 27 2021, 6:15 PM
douardda requested changes to D6026: Add test for origin-revision layer.

I am not fond at all of the code duplication (between R-C and O-R synth file parsers), looks to me at least parts of it could be kept factorised in a dedicated module (I agree it should not live in conftest any more: too much code and logic now). It would then be best to have these test-helper functions tested themselves (as unitary as possible).

Jul 27 2021, 6:11 PM
douardda requested review of D6031: Add a quick start section in the documentation and simplify the configuration file loading mechanism in the cli.
Jul 27 2021, 6:05 PM
douardda added a comment to D5843: Add support for a denormalized version of the provenance DB.

It's not clear to me how the denormalized version handles the insertion of duplicated entries.

Jul 27 2021, 4:39 PM
douardda added a comment to D5843: Add support for a denormalized version of the provenance DB.

It's not clear to me how the denormalized version handles the insertion of duplicated entries.

Jul 27 2021, 4:36 PM
douardda updated the diff for D6015: Use stored SQL functions for content_find_{all,one}() and merge Provenance*DB classes in a single ProvenanceDB.

rebase and cpitalize sql queries

Jul 27 2021, 4:27 PM
douardda updated the diff for D5843: Add support for a denormalized version of the provenance DB.

capitalize sql querie

Jul 27 2021, 4:26 PM
douardda added inline comments to D6015: Use stored SQL functions for content_find_{all,one}() and merge Provenance*DB classes in a single ProvenanceDB.
Jul 27 2021, 4:02 PM
douardda accepted D6002: git_bare: Add support for swh-graph when loading a snapshot.

LGTM but see my questions (not sure they make really sense, but who knows)

Jul 27 2021, 11:28 AM
douardda added a comment to T3444: 26/07/2021: Unstuck infrastructure outage then post-mortem.

ceph is not properly monitored (ENOSPC should not get unoticed on these machines),

P1099 and further earlier logs from that moment do not seem to warn about this... T3945
got created for this.

Jul 27 2021, 9:52 AM · System administration

Jul 26 2021

douardda added a comment to T3444: 26/07/2021: Unstuck infrastructure outage then post-mortem.

Potential issues/weakness of our current infra:

Jul 26 2021, 5:15 PM · System administration

Jul 22 2021

douardda added a comment to D6015: Use stored SQL functions for content_find_{all,one}() and merge Provenance*DB classes in a single ProvenanceDB.

I would have loved to also replace the logic in relation_add() and _relation_get() by stored SQL functions, but it's above my poor SQL skills...

Jul 22 2021, 5:47 PM
douardda requested review of D6015: Use stored SQL functions for content_find_{all,one}() and merge Provenance*DB classes in a single ProvenanceDB.
Jul 22 2021, 3:05 PM

Jul 21 2021

douardda updated the diff for D5843: Add support for a denormalized version of the provenance DB.

rebase

Jul 21 2021, 3:09 PM

Jul 19 2021

douardda added a comment to T3104: Persistent readonly perfect hash table.

sorry I don't understand everything here:

Jul 19 2021, 5:20 PM · Object storage (RedHat collaboration)

Jul 2 2021

douardda accepted D5943: Fix database queries related to the origin-revision layer.

I still disagree with the implementation of get_dates() but meh

Jul 2 2021, 4:38 PM
douardda added inline comments to D5943: Fix database queries related to the origin-revision layer.
Jul 2 2021, 4:36 PM
douardda added inline comments to D5943: Fix database queries related to the origin-revision layer.
Jul 2 2021, 4:35 PM
douardda accepted D5947: Add `ProvenanceStorageInterface` as discussed during backend design.

I've made several small comments / nitpicks, fell free to address them or not.

Jul 2 2021, 4:32 PM
douardda added inline comments to D5947: Add `ProvenanceStorageInterface` as discussed during backend design.
Jul 2 2021, 4:30 PM
douardda accepted D5946: Rework `ProvenanceInterface` as discussed during backend design.

okay but as stated, I don't like too much the general usage of the RealDictCursor; sometimes it helps, but sometimes it does not. Ideally both should be available (depending on the query).

Jul 2 2021, 3:48 PM
douardda requested changes to D5943: Fix database queries related to the origin-revision layer.
Jul 2 2021, 3:40 PM
douardda accepted D5925: Refactor ArchiveInterface to fit origin-revision layer needs.
Jul 2 2021, 3:33 PM

Jul 1 2021

douardda added inline comments to D5943: Fix database queries related to the origin-revision layer.
Jul 1 2021, 3:28 PM
douardda added inline comments to D5925: Refactor ArchiveInterface to fit origin-revision layer needs.
Jul 1 2021, 3:25 PM
douardda accepted D5944: Add tests for history graph topology.

ok but please remove print statements before

Jul 1 2021, 12:37 PM
douardda added inline comments to D5944: Add tests for history graph topology.
Jul 1 2021, 12:33 PM
douardda updated subscribers of D5943: Fix database queries related to the origin-revision layer.
Jul 1 2021, 12:29 PM
douardda requested changes to D5925: Refactor ArchiveInterface to fit origin-revision layer needs.
Jul 1 2021, 12:07 PM
douardda added a comment to D5943: Fix database queries related to the origin-revision layer.

Why do all these queries use LOCK TABLE?

Jul 1 2021, 10:53 AM
douardda accepted D5948: Force `snapshot_get_heads` to return revisions in chronological order.

ok but the SQL query could be improved to not return unwanted dates

Jul 1 2021, 10:49 AM

Jun 29 2021

douardda triaged T3416: Implement the replayer service for Vitam as High priority.
Jun 29 2021, 9:33 AM
douardda added a comment to T3415: Specify the Vitam archiving format.

This initial proposal from CINES has not been selected because it de facto normalize a number of relations of the SWH graph making it unfit to storage in a solution like Vitam (too many objects, hard to manage incremental updates).

Jun 29 2021, 9:30 AM
douardda added a comment to T3415: Specify the Vitam archiving format.
  1. Proposal from CINES
Jun 29 2021, 9:27 AM
douardda triaged T3415: Specify the Vitam archiving format as High priority.
Jun 29 2021, 9:27 AM
douardda triaged T3414: Save the Archive in CINES' Vitam platform as High priority.
Jun 29 2021, 9:22 AM · meta-task, Roadmap 2022

Jun 28 2021

douardda added a comment to D5843: Add support for a denormalized version of the provenance DB.

Should this be documented somewhere? (How to use it / why)

Jun 28 2021, 3:35 PM

Jun 25 2021

douardda created P1078 (An Untitled Masterwork).
Jun 25 2021, 3:33 PM
douardda accepted D5893: hypothesis_strategies: Add raw_extrinsic_metadata() strategy.
Jun 25 2021, 11:27 AM
douardda accepted D5914: backend: Auto-generate origin visit stats upsert query.
Jun 25 2021, 11:25 AM
douardda accepted D5916: cli/task: Ensure cli output is always in the same order.
Jun 25 2021, 11:23 AM
douardda requested changes to D5917: journal_client: Only check last_* fields for some permutation tests.
Jun 25 2021, 11:22 AM
douardda added a comment to D5917: journal_client: Only check last_* fields for some permutation tests.

I think I'd rather like to have an explicit list of excluded fields (when these extra fields are added). So I'd prefer see this diff be something that compares dicts (as a result of BaseObject.to_dict()), possibly filtered to exclude some fields.

Jun 25 2021, 11:21 AM

Jun 23 2021

douardda added a comment to D5843: Add support for a denormalized version of the provenance DB.

Also, at some point we might want to use better templating to write these SQL queries, or use stored procedures (with the proper "variation" being chosen at db creation time on the selected flavor; would simplify the python code a lot.

Jun 23 2021, 11:22 AM
douardda updated the diff for D5843: Add support for a denormalized version of the provenance DB.

reword a bit the ci message and kill a few tabs in 30-schema.sql

Jun 23 2021, 11:12 AM
douardda added a comment to D5843: Add support for a denormalized version of the provenance DB.

yes I know, names for the subqueries are horrible...

Jun 23 2021, 11:07 AM
douardda added a comment to D5843: Add support for a denormalized version of the provenance DB.

yes I know, names for the subqueries are horrible...

Jun 23 2021, 11:07 AM
douardda retitled D5843: Add support for a denormalized version of the provenance DB from [WIP] Add support for a denormalized version of the provenance DB to Add support for a denormalized version of the provenance DB.
Jun 23 2021, 11:04 AM
douardda updated the diff for D5843: Add support for a denormalized version of the provenance DB.

rebase, adapt and implement denormalization for content_in_dir and dir_in_rev

Jun 23 2021, 11:03 AM

Jun 22 2021

douardda abandoned D5841: Remove the without-path flavor of ProvenanceDB.

we keep it for now

Jun 22 2021, 5:12 PM
douardda abandoned D5885: Add support for (topological) branches and merges in generate_repo.py.

I believe this diff is duplicated and the other one was already landed.

Jun 22 2021, 5:11 PM
douardda accepted D5902: Remove origin_get_id method from ProvenanceInterface.

overall ok but see the comment

Jun 22 2021, 11:05 AM

Jun 21 2021

douardda closed D5894: Allow to add extra origins and snapshots in generated test storages.
Jun 21 2021, 4:48 PM
douardda closed D5892: Add support for (topological) branches and merges in generate_repo.py.
Jun 21 2021, 4:48 PM
douardda committed rDPROV011645221cf6: Allow to add extra origins and snapshots in generated test storages (authored by douardda).
Allow to add extra origins and snapshots in generated test storages
Jun 21 2021, 4:48 PM
douardda committed rDPROV6734fd36b872: Add support for (topological) branches and merges in generate_repo.py (authored by douardda).
Add support for (topological) branches and merges in generate_repo.py
Jun 21 2021, 4:48 PM
douardda closed D5891: Refactor the generate_storage_from_git dataset creation tool.
Jun 21 2021, 4:48 PM
douardda committed rDPROV7886bf494ab8: Refactor the generate_storage_from_git dataset creation tool (authored by douardda).
Refactor the generate_storage_from_git dataset creation tool
Jun 21 2021, 4:48 PM
douardda updated the diff for D5891: Refactor the generate_storage_from_git dataset creation tool.

rebase

Jun 21 2021, 4:46 PM