Page MenuHomeSoftware Heritage
Feed Advanced Search

Oct 26 2020

douardda added a comment to T1577: Compare/benchmark objstorage backends .

See also T2706

Oct 26 2020, 12:30 PM · Object storage
douardda closed T1370: Report key code metrics in prometheus as Resolved.
Oct 26 2020, 12:30 PM · Metrics/monitoring, Restricted Project, Continuous Integration, System administration
douardda closed T1370: Report key code metrics in prometheus, a subtask of T1024: Proper continuous integration setup, as Resolved.
Oct 26 2020, 12:30 PM · Restricted Project, Continuous Integration, System administration
douardda added a comment to D4082: Make the type of values of JournalWriter generic, so it works with types not from swh-model..

Wouldn't it make a bit easier to name the generic version of the journal writer something like GenericKafkaJournalWriter and have KafkaJournalWriter = GenericKafkaJournalWriter[BaseModel] ? (for bw compat)

Why? This change won't break any code using KafkaJournalWriter

Oct 26 2020, 12:25 PM
douardda committed rDOBJSRPLca21ed290d25: Normalize the expected config entry for the journal_client (authored by douardda).
Normalize the expected config entry for the journal_client
Oct 26 2020, 12:25 PM
douardda closed D4333: Normalize the expected config entry for the journal_client.
Oct 26 2020, 12:25 PM
douardda triaged T2735: Add / expose metrics on BW usage for authenticated users as Normal priority.
Oct 26 2020, 12:16 PM · Web app, System administration
douardda triaged T2733: Explore / install a varnish prometheus probe as High priority.
Oct 26 2020, 11:01 AM · Metrics/monitoring, System administration

Oct 23 2020

douardda closed D4193: swh identify: add --exclude.

closed by 2b869aa7d30d099ed6146d9f8dc667cd7a8eefc3

Oct 23 2020, 11:52 AM
douardda commandeered D4193: swh identify: add --exclude.
Oct 23 2020, 11:52 AM
douardda requested changes to D4311: Add testing repository builder.
Oct 23 2020, 11:25 AM
douardda added a comment to D4311: Add testing repository builder.

Also the commit message should give a bit more information on what this new script is needed for, maybe with a usage example.
This defines a bunch of commands. When and how should "I" use them?

Oct 23 2020, 11:25 AM
douardda added a comment to D4311: Add testing repository builder.

please do not put the "depends on Dxxx" line in the git commit message.

Oct 23 2020, 11:21 AM
douardda requested changes to D4313: Add new example repository generated from script.

ok on the diff itself, but why is this new example repo needed for? This should be explained in the commit message. (the "why"! always insist on the "why" rather than (or in addition to) the "what" in your commit messages, please.

Oct 23 2020, 11:20 AM
douardda accepted D4193: swh identify: add --exclude.

ok but please properly document arguments in docstrings.

Oct 23 2020, 11:00 AM
douardda updated the task description for T2645: Add listing tasks for gitea instances.
Oct 23 2020, 10:20 AM · Origin-Gitea/Gogs, Archive coverage, Lister

Oct 22 2020

douardda created D4333: Normalize the expected config entry for the journal_client.
Oct 22 2020, 4:01 PM
douardda accepted D4312: Add `Hg20BundleLoader` tests from json files.

globally ok, but please add a README file as suggested in the previous comment

Oct 22 2020, 12:32 PM
douardda added a comment to D4312: Add `Hg20BundleLoader` tests from json files.

Would be nice to have a README file in tests/data explaining what these json files are and how to produce them.

Oct 22 2020, 11:52 AM
douardda added inline comments to D4082: Make the type of values of JournalWriter generic, so it works with types not from swh-model..
Oct 22 2020, 11:46 AM
douardda added a comment to D4082: Make the type of values of JournalWriter generic, so it works with types not from swh-model..

Wouldn't it make a bit easier to name the generic version of the journal writer something like GenericKafkaJournalWriter and have KafkaJournalWriter = GenericKafkaJournalWriter[BaseModel] ? (for bw compat)

Oct 22 2020, 11:43 AM
douardda requested changes to D4193: swh identify: add --exclude.

This globally LGTM but there is this path encoding issue. The 2 new functions in from_disk.py should take a bytes argument instead of a str one.

Oct 22 2020, 11:25 AM

Oct 21 2020

douardda created P830 (An Untitled Masterwork).
Oct 21 2020, 2:39 PM
douardda updated the task description for T2645: Add listing tasks for gitea instances.
Oct 21 2020, 12:18 PM · Origin-Gitea/Gogs, Archive coverage, Lister

Oct 19 2020

douardda created P828 (An Untitled Masterwork).
Oct 19 2020, 5:48 PM
douardda added inline comments to D4216: add swh-hg-identify a cli to identify hg objects.
Oct 19 2020, 4:49 PM
douardda triaged T2717: Write an end-user documentation on how to use the authenticated stack as High priority.
Oct 19 2020, 1:26 PM · Web app, Documentation

Oct 16 2020

douardda added a comment to T2706: Benchmark objstorage for mirror (uffizi vs. azure vs. s3).

Same as before but with 1M (fresh) sha1s:

Oct 16 2020, 1:02 PM · Object storage, Mirror
douardda added a comment to T2706: Benchmark objstorage for mirror (uffizi vs. azure vs. s3).

Since the results on uffizi above did suffer from a few caveats, I've made a few more tests:

  • a first result has been obtained with a dataset that had only objects stored on the XFS part of the objstorage
  • a second dataset has been created (with the order by sha256 part to spread the sha1s)
  • but results are a mix hot/cold cache tests
Oct 16 2020, 11:59 AM · Object storage, Mirror

Oct 15 2020

douardda added a comment to T2706: Benchmark objstorage for mirror (uffizi vs. azure vs. s3).

Some results:

Oct 15 2020, 1:02 PM · Object storage, Mirror
douardda added a comment to T2706: Benchmark objstorage for mirror (uffizi vs. azure vs. s3).

Current benchmarck scenario:

Oct 15 2020, 12:43 PM · Object storage, Mirror
douardda triaged T2706: Benchmark objstorage for mirror (uffizi vs. azure vs. s3) as High priority.
Oct 15 2020, 12:36 PM · Object storage, Mirror

Oct 14 2020

douardda created P821 (An Untitled Masterwork).
Oct 14 2020, 3:43 PM
douardda created P820 bench objstorage.
Oct 14 2020, 3:39 PM

Oct 13 2020

douardda accepted D4089: Add tests and fix behavior of scanner cli.

I'm mostly OK with this now, so I'll make it "accepted", but please refactor a bit the cli_run_[n]ok() helper functions before landing it.

Oct 13 2020, 1:24 PM
douardda set the repository for D4193: swh identify: add --exclude to rDMOD Data model.
Oct 13 2020, 1:04 PM
douardda added a comment to D4193: swh identify: add --exclude.

@douardda @zack Note that this diff somehow did not trigger the ci tests, only the linters. No idea why. just a heads up.

Oct 13 2020, 1:03 PM

Oct 9 2020

douardda committed rCDFPcdb00c8e5a34: Revert to a orgname/reponame based images naming scheme (authored by douardda).
Revert to a orgname/reponame based images naming scheme
Oct 9 2020, 5:08 PM
douardda committed rCDFP65cbf32679ad: Reorganize the images under the softwareheritage hub repo (authored by douardda).
Reorganize the images under the softwareheritage hub repo
Oct 9 2020, 4:39 PM
douardda accepted D4210: backfill: use the common `storage` top-level config key.

same

Oct 9 2020, 4:34 PM
douardda accepted D4209: backfill: support arbitrary journal writer configuration.

oh so much yes!

Oct 9 2020, 4:33 PM
douardda updated the task description for T2682: Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage).
Oct 9 2020, 3:47 PM · Staging environment, System administration
douardda triaged T2682: Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage) as High priority.
Oct 9 2020, 3:37 PM · Staging environment, System administration

Oct 8 2020

douardda added inline comments to D4198: PEP8 refactoring of scanner modules.
Oct 8 2020, 6:19 PM
douardda added a comment to D4198: PEP8 refactoring of scanner modules.

Overall ok, but I would have preferred the renaming be in a dedicated revision, separated from type annotation fixes/additions.

Oct 8 2020, 6:05 PM
douardda accepted D4182: conftest: Declare swh.core pytest_plugin.

otherwise fine with me

Oct 8 2020, 5:56 PM
douardda added a comment to D4182: conftest: Declare swh.core pytest_plugin.

does this requires the plugin's entrypoint in swh.core be removed ? (eg. because of swh.core.pytest_plugin being loaded twice or something like that) or is it safe to apply and use with a swh.core that still declates its pytest_plugin an entrypoint?

Oct 8 2020, 5:56 PM
douardda added a comment to D4078: Add a 'unique_key' method on model objects.

yes, it would make sense for values. Do you want to open a task for that?

you read my mind :-)

Oct 8 2020, 12:42 PM
douardda added a comment to T1279: swh-journal: The schema migration problem.

Since this "migration problem" also concerns cassandra, maybe an simple approach would be to add a Final version attribute to all model entities (a simple monotonic integer).

Oct 8 2020, 12:41 PM · Journal
douardda updated subscribers of D4078: Add a 'unique_key' method on model objects.

maybe stupid question, but why using dict as unique key (in many model classes)? Why not use a tuple? I mean it seems to me that such a UID should be usable as dict keys or in a set directly.

I don't know, I just copied what we were already doing in swh-journal. Dicts have the nice property of being somewhat "self-documenting" though.

Oct 8 2020, 12:39 PM
douardda added a comment to D4078: Add a 'unique_key' method on model objects.

yes, it would make sense for values. Do you want to open a task for that?

Oct 8 2020, 12:36 PM
douardda added a comment to D4078: Add a 'unique_key' method on model objects.

maybe stupid question, but why using dict as unique key (in many model classes)? Why not use a tuple? I mean it seems to me that such a UID should be usable as dict keys or in a set directly.

I don't know, I just copied what we were already doing in swh-journal. Dicts have the nice property of being somewhat "self-documenting" though.

Also (most probably dumb idea, writing as it pops in my mind), wouldn't it make sense to add some kind of 'per-object class model version' in the key?

This would prevent compacting away old versions of objects. Is this something we want?

Oct 8 2020, 12:34 PM
douardda added a comment to D4194: model: use visit ids in the unique key, instead of their date..

microsecond in postgres, millisecond in cassandra.

Oct 8 2020, 12:27 PM
douardda accepted D4186: scanner.model: Fix Tree.toDict to be side-effect free.

The split in 2 revisions is not mandatory, just sayin' for good measure.

Oct 8 2020, 12:16 PM
douardda added a comment to D4186: scanner.model: Fix Tree.toDict to be side-effect free.

looks good (did not even notice toDict() is not even a recursive method! so this dict_nodes really makes no sense at all).

Oct 8 2020, 12:15 PM
douardda added a comment to D4078: Add a 'unique_key' method on model objects.

maybe stupid question, but why using dict as unique key (in many model classes)? Why not use a tuple? I mean it seems to me that such a UID should be usable as dict keys or in a set directly.

Oct 8 2020, 12:08 PM
douardda added a comment to D4078: Add a 'unique_key' method on model objects.

maybe stupid question, but why using dict as unique key (in many model classes)? Why not use a tuple? I mean it seems to me that such a UID should be usable as dict keys or in a set directly.

Oct 8 2020, 12:04 PM
douardda added a comment to D4194: model: use visit ids in the unique key, instead of their date..

dates are not unique (ie. multiple visits can share a date, and they
do in practice); and visit statuses already use visit ids in their
unique key.

Oct 8 2020, 11:52 AM
douardda added a comment to D4193: swh identify: add --exclude.
In D4193#103804, @zack wrote:

Thanks, even though this is a little bit disturbing discrepancy wrt swh-scanner exclusion mechanism,

Oct 8 2020, 11:24 AM
douardda requested changes to D4193: swh identify: add --exclude.

can you please remove the "noise" added by arc in the commit message? And update it (still the previous option name in there).

Oct 8 2020, 11:16 AM

Oct 7 2020

douardda requested changes to D3435: Add mercurial.from_disk.HgLoaderFromDisk.
Oct 7 2020, 3:52 PM

Oct 2 2020

douardda added a comment to T1410: Kill implicit configuration: new configuration scheme.

Maybe starting a pad/hackmd document would be easier at this point?

Oct 2 2020, 4:01 PM · Core & foundations
douardda added a comment to D4131: Remove parse_url helper that adds no real value.

this is debatable, but it does "normalize" the given url, so it does something. I agress the https:// auto-add prefix is strange, but the trailing / still brings value. For example there are listers that do not implement this, so if you create a listing task with url=https://somehere.org/api/v1 it will fail because it will forge invalid urls (missing the trailing /).
[edit] and I find this very annoying

Oct 2 2020, 3:14 PM
douardda added a comment to D4131: Remove parse_url helper that adds no real value.

this is debatable, but it does "normalize" the given url, so it does something. I agress the https:// auto-add prefix is strange, but the trailing / still brings value. For example there are listers that do not implement this, so if you create a listing task with url=https://somehere.org/api/v1 it will fail because it will forge invalid urls (missing the trailing /).
[edit] and I find this very annoying

Oct 2 2020, 3:13 PM
douardda added a comment to D3334: Add a new TenaciousProxyStorage.

@douardda ping?

Oct 2 2020, 2:32 PM

Oct 1 2020

douardda accepted D4112: jobs/tools/dockerfiles: Enable to trigger builds remotely.
Oct 1 2020, 2:24 PM

Sep 30 2020

douardda committed rCDFP0264e6b611ca: Fix a typo in the README file (authored by douardda).
Fix a typo in the README file
Sep 30 2020, 11:48 AM
douardda committed rCDFP110773478d07: Update the db initialization to swh-storage 0.15 (authored by douardda).
Update the db initialization to swh-storage 0.15
Sep 30 2020, 11:48 AM
douardda requested changes to D4089: Add tests and fix behavior of scanner cli.
Sep 30 2020, 10:13 AM
douardda accepted D4092: packer: Clean up spurious blanks.
Sep 30 2020, 9:41 AM
douardda closed T2313: Archive git.fsfe.org (Gitea) as Resolved.

Listed (oneshot full + recurring incremental) and loaded (as far as I can tell).

Sep 30 2020, 9:37 AM · Archive coverage, Lister

Sep 29 2020

douardda added a comment to D4055: Drop debian packaging, which is now handled on separate branches.

(just updated my swh-env, now I see where this diff comes from :-) )

Sep 29 2020, 12:09 PM
douardda accepted D4055: Drop debian packaging, which is now handled on separate branches.

sure

Sep 29 2020, 11:49 AM
douardda accepted D4068: jobs/swh-environment: Improve build script.

LGTM (except the "rm -f ../$module.log " for which I am not convinced it's a good idea)

Sep 29 2020, 11:47 AM
douardda added a comment to T2313: Archive git.fsfe.org (Gitea).

I've sent an email to the fsfe.

Sep 29 2020, 11:39 AM · Archive coverage, Lister
douardda triaged T2645: Add listing tasks for gitea instances as Normal priority.
Sep 29 2020, 10:09 AM · Origin-Gitea/Gogs, Archive coverage, Lister

Sep 28 2020

douardda added a comment to T2313: Archive git.fsfe.org (Gitea).

Can this be closed now? What's missing? Adding a listing task?

Sep 28 2020, 9:47 AM · Archive coverage, Lister

Sep 25 2020

douardda created P786 (An Untitled Masterwork).
Sep 25 2020, 6:37 PM
douardda committed rDMODbe8f1a559d82: Adapt cli declaration entrypoint to swh.core 0.3 (authored by douardda).
Adapt cli declaration entrypoint to swh.core 0.3
Sep 25 2020, 3:27 PM
douardda closed D4051: Adapt cli declaration entrypoint to swh.core 0.3.
Sep 25 2020, 3:27 PM
douardda updated the diff for D4051: Adapt cli declaration entrypoint to swh.core 0.3.

and with the pytest.ini hunk we don't need a (non working) dependency on swh.core[testing]

Sep 25 2020, 3:25 PM
douardda updated the diff for D4051: Adapt cli declaration entrypoint to swh.core 0.3.

add a precision in the ci message for the pytest.ini hunk

Sep 25 2020, 3:22 PM
douardda created D4051: Adapt cli declaration entrypoint to swh.core 0.3.
Sep 25 2020, 3:15 PM
douardda created P781 (An Untitled Masterwork).
Sep 25 2020, 12:59 PM
douardda created P780 (An Untitled Masterwork).
Sep 25 2020, 12:36 PM
douardda committed rDVAUec87dfe25879: Adapt cli declaration entrypoint to swh.core 0.3 (authored by douardda).
Adapt cli declaration entrypoint to swh.core 0.3
Sep 25 2020, 12:09 PM
douardda committed rDSCHbe7a5aeafa7f: Rename sql files according to swh.core 0.3 (authored by douardda).
Rename sql files according to swh.core 0.3
Sep 25 2020, 11:40 AM
douardda closed D4045: Rename sql files according to swh.core 0.3.
Sep 25 2020, 11:40 AM
douardda added a comment to D4045: Rename sql files according to swh.core 0.3.

ah yeah, it'd be best to align indeed.

Sep 25 2020, 11:39 AM
douardda committed rDOBJSRPL99571a1068b0: Adapt cli declaration entrypoint to swh.core 0.3 (authored by douardda).
Adapt cli declaration entrypoint to swh.core 0.3
Sep 25 2020, 10:02 AM
douardda created D4045: Rename sql files according to swh.core 0.3.
Sep 25 2020, 9:57 AM
douardda committed rDSCH5cc573d16f53: Adapt cli declaration entrypoint to swh.core 0.3 (authored by douardda).
Adapt cli declaration entrypoint to swh.core 0.3
Sep 25 2020, 9:53 AM

Sep 24 2020

douardda accepted D4039: Update sql paths for the moved SQL files.

crumbs everywhere

Sep 24 2020, 8:13 PM
douardda accepted D4038: tests: Run SQL files with psql instead of psycopg2.

sure

Sep 24 2020, 8:07 PM
douardda committed rCDFPf9dbb018395a: Update grafana dashboards (authored by douardda).
Update grafana dashboards
Sep 24 2020, 5:50 PM
douardda added a comment to T2624: Create strategy for documentation with a map or a full table of content.

Also https://plan.io/blog/technical-documentation/

Sep 24 2020, 3:43 PM · Roadmap 2021, meta-task, Documentation
douardda added a comment to D4012: core.loader: Log information about origin currently being ingested.
In D4012#99525, @olasd wrote:

I don't think the origin url and visit type should be sent in the task result; they're arguments of the task already.

If we want them logged by the worker when the task ends (which I agree would be useful), then we should improve logging on the worker/celery side to show some of the task arguments (for instance, if there's a "url" argument) instead / in addition of the task id.

Sep 24 2020, 3:28 PM · Core Loader
douardda added inline comments to D4012: core.loader: Log information about origin currently being ingested.
Sep 24 2020, 3:26 PM · Core Loader
douardda triaged T2621: running tox fails because C.UTF-8 is not available as Normal priority.
Sep 24 2020, 3:10 PM · Development environment
douardda closed T2119: Monitoring of workers as Resolved.
Sep 24 2020, 3:08 PM · Scheduling utilities, Sprint 2019/12 (Monitor and Conquer)