Page MenuHomeSoftware Heritage

douardda (David Douard)
User

Projects (7)

User Details

User Since
Jul 10 2018, 12:38 PM (120 w, 2 d)

Recent Activity

Today

douardda accepted D4374: cli.db: Open init-admin subcmd to initialize superuser-level scripts.
Thu, Oct 29, 4:18 PM

Yesterday

douardda created P843 (An Untitled Masterwork).
Wed, Oct 28, 4:40 PM
douardda requested changes to D4216: add swh-hg-identify a cli to identify hg objects.

see comments

Wed, Oct 28, 11:14 AM
douardda added a comment to D4216: add swh-hg-identify a cli to identify hg objects.

So the test fails on jenkins because they use the hg command from the system (since mercurial is oddly enough, not a dependency of swh-loader-mercurial) and on stretch, mercurial is 4.8
Using mercurial 5.5 is ok.

Wed, Oct 28, 11:14 AM
douardda added inline comments to D4311: Add testing repository builder.
Wed, Oct 28, 10:09 AM
douardda committed rDDOC4533ec80080d: Remove deprecated chapter on SWH infrastrucuture (authored by douardda).
Remove deprecated chapter on SWH infrastrucuture
Wed, Oct 28, 10:04 AM
douardda closed D4354: Remove deprecated chapter on SWH infrastrucuture.
Wed, Oct 28, 10:04 AM
douardda added a comment to D4354: Remove deprecated chapter on SWH infrastrucuture.

Is this chapter saved anywhere? or is it so deprecated it shouldn't be saved?

Wed, Oct 28, 10:04 AM
douardda committed rDDEP605bfda59483: Add a description of the deposit in the docs' index page (authored by douardda).
Add a description of the deposit in the docs' index page
Wed, Oct 28, 10:00 AM
douardda closed D4358: Add a description of the deposit in the docs' index page.
Wed, Oct 28, 10:00 AM
douardda updated the diff for D4358: Add a description of the deposit in the docs' index page.

rebase

Wed, Oct 28, 9:59 AM

Tue, Oct 27

douardda updated the diff for D4358: Add a description of the deposit in the docs' index page.

Define and use the SWH acronym

Tue, Oct 27, 5:14 PM
douardda updated the diff for D4358: Add a description of the deposit in the docs' index page.

fixes and improvements suggested by maoranegg (big thx)

Tue, Oct 27, 2:43 PM
douardda added inline comments to D4358: Add a description of the deposit in the docs' index page.
Tue, Oct 27, 2:37 PM
douardda accepted D4362: Update swh.storage.validate for swh.model 0.7.2.
Tue, Oct 27, 2:17 PM
douardda updated the diff for D4358: Add a description of the deposit in the docs' index page.

typos (thx ardumont)

Tue, Oct 27, 11:54 AM
douardda added a reviewer for D4358: Add a description of the deposit in the docs' index page: moranegg.
Tue, Oct 27, 11:52 AM
douardda added a comment to D4313: Add new example repository generated from script.

I would expect the commit message to be a bot more explanatory: either this new test case adds some tested aspects that were not tested before, and it should mention it, or it does not, and it should also mention it explaining this new test is the base for futures extended ones in a more manageable way (what's the "updatable" stands for, if I get this right).

Tue, Oct 27, 11:46 AM
douardda created D4358: Add a description of the deposit in the docs' index page.
Tue, Oct 27, 10:50 AM

Mon, Oct 26

douardda created D4354: Remove deprecated chapter on SWH infrastrucuture.
Mon, Oct 26, 3:09 PM
douardda closed T2421: Make model objects immutable as Resolved.

should be ok now (even if via ImmutableDict :-) )

Mon, Oct 26, 2:51 PM · Data Model
douardda closed T2423: Extract the `extra_headers` away from `Revision.metadata` into a top-level immutable object as Resolved.
Mon, Oct 26, 2:45 PM · Data Model
douardda closed T2423: Extract the `extra_headers` away from `Revision.metadata` into a top-level immutable object, a subtask of T2421: Make model objects immutable, as Resolved.
Mon, Oct 26, 2:45 PM · Data Model
douardda updated the task description for T2423: Extract the `extra_headers` away from `Revision.metadata` into a top-level immutable object.
Mon, Oct 26, 2:44 PM · Data Model
douardda added projects to T2585: Properly handle deleted/no longer available origins: Core Loader, Lister.
Mon, Oct 26, 2:41 PM · Lister, Core Loader
douardda created P836 (An Untitled Masterwork).
Mon, Oct 26, 2:29 PM
douardda added a comment to T1577: Compare/benchmark objstorage backends .

See also T2706

Mon, Oct 26, 12:30 PM · Object storage
douardda closed T1370: Report key code metrics in prometheus as Resolved.
Mon, Oct 26, 12:30 PM · Metrics/monitoring, Restricted Project, Continuous Integration, System administration
douardda closed T1370: Report key code metrics in prometheus, a subtask of T1024: Proper continuous integration setup, as Resolved.
Mon, Oct 26, 12:30 PM · Restricted Project, Continuous Integration, System administration
douardda added a comment to D4082: Make the type of values of JournalWriter generic, so it works with types not from swh-model..

Wouldn't it make a bit easier to name the generic version of the journal writer something like GenericKafkaJournalWriter and have KafkaJournalWriter = GenericKafkaJournalWriter[BaseModel] ? (for bw compat)

Why? This change won't break any code using KafkaJournalWriter

Mon, Oct 26, 12:25 PM
douardda committed rDOBJSRPLca21ed290d25: Normalize the expected config entry for the journal_client (authored by douardda).
Normalize the expected config entry for the journal_client
Mon, Oct 26, 12:25 PM
douardda closed D4333: Normalize the expected config entry for the journal_client.
Mon, Oct 26, 12:25 PM
douardda triaged T2735: Add / expose metrics on BW usage for authenticated users as Normal priority.
Mon, Oct 26, 12:16 PM · Web app, System administration
douardda triaged T2733: Explore / install a varnish prometheus probe as High priority.
Mon, Oct 26, 11:01 AM · System administration

Fri, Oct 23

douardda closed D4193: swh identify: add --exclude.

closed by 2b869aa7d30d099ed6146d9f8dc667cd7a8eefc3

Fri, Oct 23, 11:52 AM
douardda commandeered D4193: swh identify: add --exclude.
Fri, Oct 23, 11:52 AM
douardda requested changes to D4311: Add testing repository builder.
Fri, Oct 23, 11:25 AM
douardda added a comment to D4311: Add testing repository builder.

Also the commit message should give a bit more information on what this new script is needed for, maybe with a usage example.
This defines a bunch of commands. When and how should "I" use them?

Fri, Oct 23, 11:25 AM
douardda added a comment to D4311: Add testing repository builder.

please do not put the "depends on Dxxx" line in the git commit message.

Fri, Oct 23, 11:21 AM
douardda requested changes to D4313: Add new example repository generated from script.

ok on the diff itself, but why is this new example repo needed for? This should be explained in the commit message. (the "why"! always insist on the "why" rather than (or in addition to) the "what" in your commit messages, please.

Fri, Oct 23, 11:20 AM
douardda accepted D4193: swh identify: add --exclude.

ok but please properly document arguments in docstrings.

Fri, Oct 23, 11:00 AM
douardda updated the task description for T2645: Add listing tasks for gitea instances.
Fri, Oct 23, 10:20 AM · Archive coverage, Lister

Thu, Oct 22

douardda created D4333: Normalize the expected config entry for the journal_client.
Thu, Oct 22, 4:01 PM
douardda accepted D4312: Add `Hg20BundleLoader` tests from json files.

globally ok, but please add a README file as suggested in the previous comment

Thu, Oct 22, 12:32 PM
douardda added a comment to D4312: Add `Hg20BundleLoader` tests from json files.

Would be nice to have a README file in tests/data explaining what these json files are and how to produce them.

Thu, Oct 22, 11:52 AM
douardda added inline comments to D4082: Make the type of values of JournalWriter generic, so it works with types not from swh-model..
Thu, Oct 22, 11:46 AM
douardda added a comment to D4082: Make the type of values of JournalWriter generic, so it works with types not from swh-model..

Wouldn't it make a bit easier to name the generic version of the journal writer something like GenericKafkaJournalWriter and have KafkaJournalWriter = GenericKafkaJournalWriter[BaseModel] ? (for bw compat)

Thu, Oct 22, 11:43 AM
douardda requested changes to D4193: swh identify: add --exclude.

This globally LGTM but there is this path encoding issue. The 2 new functions in from_disk.py should take a bytes argument instead of a str one.

Thu, Oct 22, 11:25 AM

Wed, Oct 21

douardda created P830 (An Untitled Masterwork).
Wed, Oct 21, 2:39 PM
douardda updated the task description for T2645: Add listing tasks for gitea instances.
Wed, Oct 21, 12:18 PM · Archive coverage, Lister

Mon, Oct 19

douardda created P828 (An Untitled Masterwork).
Mon, Oct 19, 5:48 PM
douardda added inline comments to D4216: add swh-hg-identify a cli to identify hg objects.
Mon, Oct 19, 4:49 PM
douardda triaged T2717: Write an end-user documentation on how to use the authenticated stack as High priority.
Mon, Oct 19, 1:26 PM · Web app, Documentation

Fri, Oct 16

douardda added a comment to T2706: Benchmark objstorage for mirror (uffizi vs. azure vs. s3).

Same as before but with 1M (fresh) sha1s:

Fri, Oct 16, 1:02 PM · Object storage, Mirror
douardda added a comment to T2706: Benchmark objstorage for mirror (uffizi vs. azure vs. s3).

Since the results on uffizi above did suffer from a few caveats, I've made a few more tests:

  • a first result has been obtained with a dataset that had only objects stored on the XFS part of the objstorage
  • a second dataset has been created (with the order by sha256 part to spread the sha1s)
  • but results are a mix hot/cold cache tests
Fri, Oct 16, 11:59 AM · Object storage, Mirror

Thu, Oct 15

douardda added a comment to T2706: Benchmark objstorage for mirror (uffizi vs. azure vs. s3).

Some results:

Thu, Oct 15, 1:02 PM · Object storage, Mirror
douardda added a comment to T2706: Benchmark objstorage for mirror (uffizi vs. azure vs. s3).

Current benchmarck scenario:

Thu, Oct 15, 12:43 PM · Object storage, Mirror
douardda triaged T2706: Benchmark objstorage for mirror (uffizi vs. azure vs. s3) as High priority.
Thu, Oct 15, 12:36 PM · Object storage, Mirror

Wed, Oct 14

douardda created P821 (An Untitled Masterwork).
Wed, Oct 14, 3:43 PM
douardda created P820 bench objstorage.
Wed, Oct 14, 3:39 PM

Tue, Oct 13

douardda accepted D4089: Add tests and fix behavior of scanner cli.

I'm mostly OK with this now, so I'll make it "accepted", but please refactor a bit the cli_run_[n]ok() helper functions before landing it.

Tue, Oct 13, 1:24 PM
douardda set the repository for D4193: swh identify: add --exclude to rDMOD Data model.
Tue, Oct 13, 1:04 PM
douardda added a comment to D4193: swh identify: add --exclude.

@douardda @zack Note that this diff somehow did not trigger the ci tests, only the linters. No idea why. just a heads up.

Tue, Oct 13, 1:03 PM

Fri, Oct 9

douardda committed rCDFPcdb00c8e5a34: Revert to a orgname/reponame based images naming scheme (authored by douardda).
Revert to a orgname/reponame based images naming scheme
Fri, Oct 9, 5:08 PM
douardda committed rCDFP65cbf32679ad: Reorganize the images under the softwareheritage hub repo (authored by douardda).
Reorganize the images under the softwareheritage hub repo
Fri, Oct 9, 4:39 PM
douardda accepted D4210: backfill: use the common `storage` top-level config key.

same

Fri, Oct 9, 4:34 PM
douardda accepted D4209: backfill: support arbitrary journal writer configuration.

oh so much yes!

Fri, Oct 9, 4:33 PM
douardda updated the task description for T2682: Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage).
Fri, Oct 9, 3:47 PM · System administration
douardda triaged T2682: Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage) as High priority.
Fri, Oct 9, 3:37 PM · System administration

Thu, Oct 8

douardda added inline comments to D4198: PEP8 refactoring of scanner modules.
Thu, Oct 8, 6:19 PM
douardda added a comment to D4198: PEP8 refactoring of scanner modules.

Overall ok, but I would have preferred the renaming be in a dedicated revision, separated from type annotation fixes/additions.

Thu, Oct 8, 6:05 PM
douardda accepted D4182: conftest: Declare swh.core pytest_plugin.

otherwise fine with me

Thu, Oct 8, 5:56 PM
douardda added a comment to D4182: conftest: Declare swh.core pytest_plugin.

does this requires the plugin's entrypoint in swh.core be removed ? (eg. because of swh.core.pytest_plugin being loaded twice or something like that) or is it safe to apply and use with a swh.core that still declates its pytest_plugin an entrypoint?

Thu, Oct 8, 5:56 PM
douardda added a comment to D4078: Add a 'unique_key' method on model objects.

yes, it would make sense for values. Do you want to open a task for that?

you read my mind :-)

Thu, Oct 8, 12:42 PM
douardda added a comment to T1279: swh-journal: The schema migration problem.

Since this "migration problem" also concerns cassandra, maybe an simple approach would be to add a Final version attribute to all model entities (a simple monotonic integer).

Thu, Oct 8, 12:41 PM · Journal
douardda updated subscribers of D4078: Add a 'unique_key' method on model objects.

maybe stupid question, but why using dict as unique key (in many model classes)? Why not use a tuple? I mean it seems to me that such a UID should be usable as dict keys or in a set directly.

I don't know, I just copied what we were already doing in swh-journal. Dicts have the nice property of being somewhat "self-documenting" though.

Thu, Oct 8, 12:39 PM
douardda added a comment to D4078: Add a 'unique_key' method on model objects.

yes, it would make sense for values. Do you want to open a task for that?

Thu, Oct 8, 12:36 PM
douardda added a comment to D4078: Add a 'unique_key' method on model objects.

maybe stupid question, but why using dict as unique key (in many model classes)? Why not use a tuple? I mean it seems to me that such a UID should be usable as dict keys or in a set directly.

I don't know, I just copied what we were already doing in swh-journal. Dicts have the nice property of being somewhat "self-documenting" though.

Also (most probably dumb idea, writing as it pops in my mind), wouldn't it make sense to add some kind of 'per-object class model version' in the key?

This would prevent compacting away old versions of objects. Is this something we want?

Thu, Oct 8, 12:34 PM
douardda added a comment to D4194: model: use visit ids in the unique key, instead of their date..

microsecond in postgres, millisecond in cassandra.

Thu, Oct 8, 12:27 PM
douardda accepted D4186: scanner.model: Fix Tree.toDict to be side-effect free.

The split in 2 revisions is not mandatory, just sayin' for good measure.

Thu, Oct 8, 12:16 PM
douardda added a comment to D4186: scanner.model: Fix Tree.toDict to be side-effect free.

looks good (did not even notice toDict() is not even a recursive method! so this dict_nodes really makes no sense at all).

Thu, Oct 8, 12:15 PM
douardda added a comment to D4078: Add a 'unique_key' method on model objects.

maybe stupid question, but why using dict as unique key (in many model classes)? Why not use a tuple? I mean it seems to me that such a UID should be usable as dict keys or in a set directly.

Thu, Oct 8, 12:08 PM
douardda added a comment to D4078: Add a 'unique_key' method on model objects.

maybe stupid question, but why using dict as unique key (in many model classes)? Why not use a tuple? I mean it seems to me that such a UID should be usable as dict keys or in a set directly.

Thu, Oct 8, 12:04 PM
douardda added a comment to D4194: model: use visit ids in the unique key, instead of their date..

dates are not unique (ie. multiple visits can share a date, and they
do in practice); and visit statuses already use visit ids in their
unique key.

Thu, Oct 8, 11:52 AM
douardda added a comment to D4193: swh identify: add --exclude.
In D4193#103804, @zack wrote:

Thanks, even though this is a little bit disturbing discrepancy wrt swh-scanner exclusion mechanism,

Thu, Oct 8, 11:24 AM
douardda requested changes to D4193: swh identify: add --exclude.

can you please remove the "noise" added by arc in the commit message? An update it (still the previous option name in there).

Thu, Oct 8, 11:16 AM

Wed, Oct 7

douardda requested changes to D3435: Add mercurial.from_disk.HgLoaderFromDisk.
Wed, Oct 7, 3:52 PM

Fri, Oct 2

douardda added a comment to T1410: Kill implicit configuration: new configuration scheme.

Maybe starting a pad/hackmd document would be easier at this point?

Fri, Oct 2, 4:01 PM · Core & foundations
douardda added a comment to D4131: Remove parse_url helper that adds no real value.

this is debatable, but it does "normalize" the given url, so it does something. I agress the https:// auto-add prefix is strange, but the trailing / still brings value. For example there are listers that do not implement this, so if you create a listing task with url=https://somehere.org/api/v1 it will fail because it will forge invalid urls (missing the trailing /).
[edit] and I find this very annoying

Fri, Oct 2, 3:14 PM
douardda added a comment to D4131: Remove parse_url helper that adds no real value.

this is debatable, but it does "normalize" the given url, so it does something. I agress the https:// auto-add prefix is strange, but the trailing / still brings value. For example there are listers that do not implement this, so if you create a listing task with url=https://somehere.org/api/v1 it will fail because it will forge invalid urls (missing the trailing /).

Fri, Oct 2, 3:13 PM
douardda added a comment to D3334: Add a new TenaciousProxyStorage.

@douardda ping?

Fri, Oct 2, 2:32 PM

Thu, Oct 1

douardda accepted D4112: jobs/tools/dockerfiles: Enable to trigger builds remotely.
Thu, Oct 1, 2:24 PM

Wed, Sep 30

douardda committed rCDFP0264e6b611ca: Fix a typo in the README file (authored by douardda).
Fix a typo in the README file
Wed, Sep 30, 11:48 AM
douardda committed rCDFP110773478d07: Update the db initialization to swh-storage 0.15 (authored by douardda).
Update the db initialization to swh-storage 0.15
Wed, Sep 30, 11:48 AM
douardda requested changes to D4089: Add tests and fix behavior of scanner cli.
Wed, Sep 30, 10:13 AM
douardda accepted D4092: packer: Clean up spurious blanks.
Wed, Sep 30, 9:41 AM
douardda closed T2313: Archive git.fsfe.org (Gitea) as Resolved.

Listed (oneshot full + recurring incremental) and loaded (as far as I can tell).

Wed, Sep 30, 9:37 AM · Archive coverage, Lister

Sep 29 2020

douardda added a comment to D4055: Drop debian packaging, which is now handled on separate branches.

(just updated my swh-env, now I see where this diff comes from :-) )

Sep 29 2020, 12:09 PM
douardda accepted D4055: Drop debian packaging, which is now handled on separate branches.

sure

Sep 29 2020, 11:49 AM
douardda accepted D4068: jobs/swh-environment: Improve build script.

LGTM (except the "rm -f ../$module.log " for which I am not convinced it's a good idea)

Sep 29 2020, 11:47 AM