Feed Advanced Search

Advanced Search
Use Results
Edit Query
Hide Query

	Include stories about projects I am a member of.

Sep 22 2021

douardda added a comment to D6165: Add new RabbitMQ-based client/server API.

In D6165#163629, @vlorentz wrote:

What is the reason for this change? Is it more efficient assign requests to workers based on ID rather than randomly?

Sep 22 2021, 10:48 AM

douardda added inline comments to D6308: Add a documentation page to list the services urls.

Sep 22 2021, 10:43 AM · System administration

douardda accepted D6317: opam: Initialize opam root directory outside the constructor.

Sep 22 2021, 10:38 AM

douardda accepted D6300: Capture missing revision <-> hgnode-id scenario in a xfail test.

Sep 22 2021, 10:37 AM

Sep 21 2021

douardda added a comment to D6133: maven-lister: initialise lister..

some more :-)

Sep 21 2021, 11:38 AM

douardda added inline comments to D6133: maven-lister: initialise lister..

Sep 21 2021, 11:14 AM

douardda added inline comments to D6310: opam: Move the state initialization into the get_pages method.

Sep 21 2021, 11:04 AM

Sep 20 2021

douardda accepted D6306: opam: Allow defining where to actually install the opam_root folder.

LGTM, but how is the new opam_root option expected to be set (in production I mean)?

Sep 20 2021, 4:46 PM

douardda requested changes to D6133: maven-lister: initialise lister..

I'm not done yet but here is first review on my side.

Sep 20 2021, 4:33 PM

douardda closed T1510: Have a look at openAPI and decide whether we want to follow these specs, a subtask of T1805: Public API v2, as Resolved.

Sep 20 2021, 11:54 AM · meta-task, Web app

douardda closed T1510: Have a look at openAPI and decide whether we want to follow these specs as Resolved.

Sep 20 2021, 11:54 AM · Web app

douardda closed T2196: Batch APIs as Wontfix.

not useful as a dedicated task, see T1805 for the main discussion one on this subject

Sep 20 2021, 11:54 AM · Roadmap 2020

douardda closed T2196: Batch APIs, a subtask of T2194: Archive Integration (Web API), as Wontfix.

Sep 20 2021, 11:54 AM · Roadmap 2021, meta-task

douardda requested changes to D6300: Capture missing revision <-> hgnode-id scenario in a xfail test.

I don't understand what exactly is (not) tested here. What does "anomad-d" stand for BTW?

Sep 20 2021, 9:59 AM

douardda accepted D6220: Added test only method info in the interface doc strings.

Sep 20 2021, 9:49 AM

Sep 16 2021

douardda added inline comments to D6281: converters: Recompute hashes and check they match the originals.

Sep 16 2021, 5:26 PM

douardda committed rDENV57ad032071ff: docker: document some useful kafka management commands in the README file (authored by douardda).

docker: document some useful kafka management commands in the README file

Sep 16 2021, 4:15 PM

douardda closed D6277: Improve docker/README a bit.

Sep 16 2021, 4:15 PM

douardda committed rDENVe24535cc0064: docker: wrap long cli command lines in the README file (authored by douardda).

docker: wrap long cli command lines in the README file

Sep 16 2021, 4:15 PM

douardda updated the diff for D6277: Improve docker/README a bit.

fix indentation (tab->ws) and a few typos

Sep 16 2021, 4:10 PM

douardda requested review of D6277: Improve docker/README a bit.

Sep 16 2021, 11:00 AM

Sep 14 2021

douardda committed rDENVb0f07795ddff: docker: Document how to consume kafka topics from the host (authored by douardda).

docker: Document how to consume kafka topics from the host

Sep 14 2021, 11:40 AM

douardda closed D6248: docker: allow kafka to be consumed from the host.

Sep 14 2021, 11:40 AM

douardda committed rDENVf612427f663d: docker: allow kafka to be consumed from the host (authored by douardda).

docker: allow kafka to be consumed from the host

Sep 14 2021, 11:40 AM

douardda closed D6247: Commit kafka messages which offset has reach the high limit.

closed by 94be817f869409c64415b181824071d2998e33d5

Sep 14 2021, 11:38 AM

douardda closed D6246: Add a JournalClientOffsetRanges.unsubscribe() method.

closed by a3c1f39013bae1a6982140d51d8bb443dc1b5c9c

Sep 14 2021, 11:37 AM

douardda updated the diff for D6248: docker: allow kafka to be consumed from the host.

Keep port 5092 exposed on host

Sep 14 2021, 11:35 AM

douardda committed rDDATASET94be817f8694: Commit kafka messages which offset has reach the high limit (authored by douardda).

Commit kafka messages which offset has reach the high limit

Sep 14 2021, 11:23 AM

douardda committed rDDATASETa3c1f39013ba: Add a JournalClientOffsetRanges.unsubscribe() method (authored by douardda).

Add a JournalClientOffsetRanges.unsubscribe() method

Sep 14 2021, 11:22 AM

douardda added inline comments to D6248: docker: allow kafka to be consumed from the host.

Sep 14 2021, 11:21 AM

Sep 13 2021

douardda committed rDDATASET0425bdea0789: Fix a missing f-string prefix (authored by douardda).

Fix a missing f-string prefix

Sep 13 2021, 5:17 PM

douardda updated the diff for D6248: docker: allow kafka to be consumed from the host.

Add a bit of documentation in the README file on how to consume kafka from the host

Sep 13 2021, 5:13 PM

douardda requested review of D6248: docker: allow kafka to be consumed from the host.

Sep 13 2021, 4:51 PM

douardda abandoned D6234: Add a --reset option to export_graph cli tool.

It's not worth the trouble, and there is a better solution (server-side)

Sep 13 2021, 4:23 PM

douardda added a comment to D6234: Add a --reset option to export_graph cli tool.

In D6234#161606, @vlorentz wrote:

You could also add a command in swh-dataset's entrypoint.sh that calls whatever Kafka's script does

Sep 13 2021, 4:20 PM

douardda added a comment to D6234: Add a --reset option to export_graph cli tool.

In D6234#161506, @vlorentz wrote:

In D6234#161491, @douardda wrote:

So either I kill this diff or it stays "intricate" with the setup of the consumer (so the whole journalprocessor.py)

Note: this feature is mainly useful for testing purpose IMHO, so I suppose it's not that critical to keep it, I just find it handy when "playing" with swh dataset export

Meh. How much easier does it make testing, compared to using Kafka's CLI (from the linked comment)?

Sep 13 2021, 4:11 PM

douardda updated the diff for D6234: Add a --reset option to export_graph cli tool.

rebase

Sep 13 2021, 4:05 PM

douardda requested review of D6247: Commit kafka messages which offset has reach the high limit.

Sep 13 2021, 4:04 PM

douardda abandoned D6235: Commit kafka messages wich offset has reach the high limit.

in favor of D6247 because phab/arcanist won't let me update this later any more (sorry)

Sep 13 2021, 4:04 PM

douardda requested review of D6246: Add a JournalClientOffsetRanges.unsubscribe() method.

Sep 13 2021, 4:02 PM

douardda committed rDDATASET358d84938d01: Reduce the size of the progress bar (authored by douardda).

Reduce the size of the progress bar

Sep 13 2021, 3:33 PM

douardda closed D6233: Make sure the progress bar for the export reaches 100%.

Sep 13 2021, 3:33 PM

douardda committed rDDATASET47713ee38c94: Make sure the progress bar for the export reaches 100% (authored by douardda).

Make sure the progress bar for the export reaches 100%

Sep 13 2021, 3:33 PM

douardda committed rDDATASET2760e322af7c: Simplify the lo/high partition offset computation (authored by douardda).

Simplify the lo/high partition offset computation

Sep 13 2021, 3:33 PM

douardda committed rDDATASETd07b2a632256: Explicitly close the temporary kafka consumer in `get_offsets` (authored by douardda).

Explicitly close the temporary kafka consumer in `get_offsets`

Sep 13 2021, 3:33 PM

douardda closed D6232: Simplify the lo/high partition offset computation.

Sep 13 2021, 3:33 PM

douardda committed rDDATASETe47a3db1287b: Use proper signature for JournalClientOffsetRanges.process() (authored by douardda).

Use proper signature for JournalClientOffsetRanges.process()

Sep 13 2021, 3:33 PM

douardda updated the diff for D6233: Make sure the progress bar for the export reaches 100%.

attempt to trick phab/arcanist

Sep 13 2021, 3:31 PM

douardda updated the diff for D6234: Add a --reset option to export_graph cli tool.

rebase

Sep 13 2021, 3:15 PM

douardda updated the diff for D6235: Commit kafka messages wich offset has reach the high limit.

Rebase (remove D6234 from dependencies)

Sep 13 2021, 3:14 PM

douardda added a comment to D6234: Add a --reset option to export_graph cli tool.

In D6234#161331, @douardda wrote:

In D6234#161233, @vlorentz wrote:

Can we keep the reset stuff outside the journalprocessor.py logic? It's already complex enough

I'll give it a try

Sep 13 2021, 2:59 PM

Sep 10 2021

douardda updated the diff for D6235: Commit kafka messages wich offset has reach the high limit.

rebase, fix typos, squash revisions

Sep 10 2021, 5:54 PM

douardda updated the diff for D6234: Add a --reset option to export_graph cli tool.

rebase and fix --reset help messsage

Sep 10 2021, 5:52 PM

douardda updated the diff for D6233: Make sure the progress bar for the export reaches 100%.

rebase

Sep 10 2021, 5:52 PM

douardda updated the diff for D6232: Simplify the lo/high partition offset computation.

Add an explicit "skipped" message if a nothin is to be consumed for a topic

Sep 10 2021, 5:51 PM

douardda added inline comments to D6235: Commit kafka messages wich offset has reach the high limit.

Sep 10 2021, 5:42 PM

douardda added a comment to D6234: Add a --reset option to export_graph cli tool.

In D6234#161233, @vlorentz wrote:

Can we keep the reset stuff outside the journalprocessor.py logic? It's already complex enough

Sep 10 2021, 5:37 PM

douardda added a comment to D6235: Commit kafka messages wich offset has reach the high limit.

In D6235#161311, @vlorentz wrote:

lags reported by cmak was completely inconsistent

only because you have a small dataset, right?
With a larger one, the last batch of each partition should have a negligeable size.

Sep 10 2021, 5:26 PM

douardda added a comment to D6235: Commit kafka messages wich offset has reach the high limit.

In D6235#161236, @vlorentz wrote:

There's a bunch of typos in your commit/diff msg: "wich", "oef", "ony", "ALL offsets that needs to be", "stash" -> "squash"

this is necessary to ensure these messages are committed in kafka,
otherwise, since the (considered) empty partition is unsubscribed from,
it never gets committed in JournalClient.handle_messages() (since this
later only commit assigned partitions).

Why is this a problem?

Sep 10 2021, 4:35 PM

Sep 9 2021

douardda updated the summary of D6235: Commit kafka messages wich offset has reach the high limit.

Sep 9 2021, 6:01 PM

douardda requested review of D6235: Commit kafka messages wich offset has reach the high limit.

Sep 9 2021, 6:01 PM

douardda requested review of D6234: Add a --reset option to export_graph cli tool.

Sep 9 2021, 5:58 PM

douardda updated the diff for D6233: Make sure the progress bar for the export reaches 100%.

add forgotten revision: Reduce the size of the progress bar

Sep 9 2021, 5:56 PM

douardda requested review of D6233: Make sure the progress bar for the export reaches 100%.

Sep 9 2021, 5:56 PM

douardda requested review of D6232: Simplify the lo/high partition offset computation.

Sep 9 2021, 5:54 PM

douardda accepted D6215: docker/conf: Fix search journal client configurations.

Sep 9 2021, 10:11 AM

douardda requested changes to D6220: Added test only method info in the interface doc strings.

Please use imperative style in the got commit message
https://chris.beams.io/posts/git-commit/

Sep 9 2021, 9:03 AM

Sep 3 2021

douardda abandoned D5648: Add a bit of logging in the buffer proxy storage.

Sep 3 2021, 10:50 AM

douardda abandoned D4920: Randomize last_update in generated ListedOrigins in fill_test_data.

Sep 3 2021, 10:49 AM

Sep 1 2021

douardda added a comment to T3542: Decide what metadata we want to / can collect from GitHub.

do we need the "list of forks" if we keep the "fork of what"? I mean these are the 2 ends of the fork relation, right?

Sep 1 2021, 12:06 PM · Origin-GitHub, Extrinsic metadata

Aug 30 2021

douardda added a comment to T3487: Installation of the new provenance server.

yes the idea is to have a beefy enough machine to perform full-size experiments on, that can then be (part of) the production infrastructure dedicated to the provenance index.

Aug 30 2021, 11:28 AM · System administration

Aug 13 2021

douardda accepted D6087: Remove shell scripts from setup.py.

Aug 13 2021, 4:54 PM

douardda added inline comments to D5818: send-to-celery: Add more options to allow scheduling of edge case origins.

Aug 13 2021, 3:35 PM

douardda added a comment to D6084: Rename PostgreSQL backend and code styling.

In D6084#157322, @aeviso wrote:

For the fix of revision_get, there should be a test.

The test is coming later from @jayeshv mongodb branch.

Aug 13 2021, 12:15 PM

douardda resigned from D6084: Rename PostgreSQL backend and code styling.

Aug 13 2021, 12:05 PM

douardda requested changes to D6084: Rename PostgreSQL backend and code styling.

Please don't mix fixes with codestyling/renaming revisions in a single diff, it makes the review much harder.

Aug 13 2021, 11:35 AM

douardda added a comment to T3444: 26/07/2021: Unstuck infrastructure outage then post-mortem.

And we could also use zfs-backed thin provisionning for the / of workers to save storage space (and possibly help to ensure consistency of deployed workers... not extra convinced of this later point)

Aug 13 2021, 10:31 AM · System administration

douardda added a comment to T3444: 26/07/2021: Unstuck infrastructure outage then post-mortem.

In T3444#68653, @vlorentz wrote:

but that requires some more storage on hypervisors we currently don't have

Don't the hypervisors also serve as OSDs? We could just get a disk per hypervisor (partially?) out of the ceph cluster and use it for the workers' /tmp, or even their whole disk.

Aug 13 2021, 10:25 AM · System administration

douardda accepted D6073: bytes_to_str: Format strings directly, instead of constructing ExtendedSWHID.

but anyway, it looks fine to me

Aug 13 2021, 9:55 AM

douardda added inline comments to D6073: bytes_to_str: Format strings directly, instead of constructing ExtendedSWHID.

Aug 13 2021, 9:54 AM

douardda added a comment to T3444: 26/07/2021: Unstuck infrastructure outage then post-mortem.

one other improvement may be to modify a bit the profile of the workers (to reduce the load on the ceph cluster):

lower the replication factor for workers' volumes (or even use local storage, but that requires some more storage on hypervisors we currently don't have),
(probably not very relevant but) stop having swap on workers (since this swap end up being on the ceph volume, so replicated etc.) (oh this has been done already, good)

Aug 13 2021, 9:24 AM · System administration

Aug 12 2021

douardda added inline comments to D6073: bytes_to_str: Format strings directly, instead of constructing ExtendedSWHID.

Aug 12 2021, 1:57 PM

douardda accepted D6071: Revisited history graph implementation.

Aug 12 2021, 10:04 AM

douardda added a comment to D6071: Revisited history graph implementation.

In D6071#157080, @aeviso wrote:

the use of newly introduced as_dict() methods seems unrelated here; unless I'm mistaken, the purpose if this change is better assertion reports by pytest on failure; if so, it should be presented as this in a dedicated revision

This method is only used for test purposes but it doesn't make sense without the refactoring (the complete HistoryGraph class was not even present prior to the refactoring),

Aug 12 2021, 10:04 AM

Aug 11 2021

douardda requested changes to D6071: Revisited history graph implementation.

Aug 11 2021, 12:37 PM

douardda added a comment to D6071: Revisited history graph implementation.

A few remarks:

Aug 11 2021, 12:37 PM

Aug 10 2021

douardda updated the task description for T3085: Complete and updated copy of the archive on S3 (objects+graph).

Aug 10 2021, 4:00 PM · Roadmap 2022, meta-task, Roadmap 2021, System administration, Object storage

douardda added a parent task for T1954: Up-to-date objstorage mirror on S3: T3477: Add alerting when the copy to S3 starts lagging.

Aug 10 2021, 3:59 PM · System administration, Object storage

douardda added a subtask for T3477: Add alerting when the copy to S3 starts lagging: T1954: Up-to-date objstorage mirror on S3.

Aug 10 2021, 3:59 PM · Roadmap 2021, System administration

douardda triaged T3477: Add alerting when the copy to S3 starts lagging as High priority.

Aug 10 2021, 3:58 PM · Roadmap 2021, System administration

douardda updated the task description for T3085: Complete and updated copy of the archive on S3 (objects+graph).

Aug 10 2021, 3:56 PM · Roadmap 2022, meta-task, Roadmap 2021, System administration, Object storage

douardda added a comment to T1954: Up-to-date objstorage mirror on S3.

well this task should be closed, and a new subtask could be added for the alerting

Aug 10 2021, 3:55 PM · System administration, Object storage

douardda added a comment to T1954: Up-to-date objstorage mirror on S3.

unless I'm mistaken, this task can be closed now, it looks to have reached a steady state where the lag is near 0

Aug 10 2021, 2:18 PM · System administration, Object storage

Aug 9 2021

douardda accepted D6067: cassandra: Fix crash when using _missing() functions with more than 100 ids with ScyllaDB..

Aug 9 2021, 11:33 AM

douardda accepted D6069: from_disk: Do not drop tags with missing tagger or date.

Aug 9 2021, 11:32 AM

Aug 6 2021

douardda created P1116 (An Untitled Masterwork).

Aug 6 2021, 3:21 PM

douardda added a comment to T3453: Refactor the backend to make it scale better.

I've been thinking a bit about the refactoring of the ProvenanceStorageServer as described in the doc, with a series of queues between the public API and the backend database.

Aug 6 2021, 11:08 AM · Provenance database

douardda updated subscribers of T3453: Refactor the backend to make it scale better.

Aug 6 2021, 11:04 AM · Provenance database

douardda accepted D6054: Add test for the different `ProvenanceStorageInterface` implementations.

Aug 6 2021, 10:59 AM

douardda closed D6031: Add a quick start section in the documentation and simplify the configuration file loading mechanism in the cli.

Aug 6 2021, 10:58 AM