In D2803#67209, @olasd wrote:

In D2803#67208, @douardda wrote:

In D2803#67024, @olasd wrote:

My main doubt was whether we stopped explicitly converting model objects to dicts altogether (going through the swh.core model serializer instead). But even in that case contents will still be deserializable (as Content.from_dict(d) still works even when d['data'] is None).

What swh.core model serializer do you refer to? The ones in swh.core.api?

Yes. And now that you've pointed it out, I've remembered that it's the swh.storage RPC layer that adds a hook to support model objects.

Mar 11 2020, 10:38 AM

douardda added a comment to D2803: storage/writer: refactor JournalWriter.content_add to send model objects.

In D2803#67024, @olasd wrote:

My main doubt was whether we stopped explicitly converting model objects to dicts altogether (going through the swh.core model serializer instead). But even in that case contents will still be deserializable (as Content.from_dict(d) still works even when d['data'] is None).

Mar 11 2020, 10:35 AM

Mar 10 2020

douardda closed D2801: kafka: normalize KafkaJournalWriter.write_addition[s] API.

Mar 10 2020, 5:35 PM

douardda committed rDJNL82df6acedbb1: kafka: normalize KafkaJournalWriter.write_addition[s] API (authored by douardda).

kafka: normalize KafkaJournalWriter.write_addition[s] API

Mar 10 2020, 5:35 PM

douardda updated the diff for D2801: kafka: normalize KafkaJournalWriter.write_addition[s] API.

remove extra parameter 'anon' mistakenly included in the diff

Mar 10 2020, 5:29 PM

douardda created D2803: storage/writer: refactor JournalWriter.content_add to send model objects.

Mar 10 2020, 4:46 PM

douardda committed rDSTOa97781d21131: storage/validate: small code formatting (authored by douardda).

storage/validate: small code formatting

Mar 10 2020, 4:43 PM

douardda created D2801: kafka: normalize KafkaJournalWriter.write_addition[s] API.

Mar 10 2020, 4:41 PM

Mar 6 2020

douardda created P605 (An Untitled Masterwork).

Mar 6 2020, 5:36 PM

douardda added inline comments to D2777: journal.replay: Batch insert contents/skipped_contents in storage backend.

Mar 6 2020, 1:40 PM

douardda committed rDSTO3b8b718aa0c5: sql: do not attempt to create the plpgsql lang if already exists (authored by douardda).

sql: do not attempt to create the plpgsql lang if already exists

Mar 6 2020, 1:39 PM

douardda closed D2776: sql: do not attempt to create the plpgsql lang if already exists.

Mar 6 2020, 1:39 PM

douardda added a comment to D2776: sql: do not attempt to create the plpgsql lang if already exists.

In D2776#66377, @olasd wrote:

This looks sound but the tests are hanging on the initialization of the postgresql database now... (at least on jenkins)

Mar 6 2020, 1:37 PM

douardda accepted D2778: Add install instructions for Cassandra..

Mar 6 2020, 1:25 PM

douardda accepted D2777: journal.replay: Batch insert contents/skipped_contents in storage backend.

ok (besides my remark).

Mar 6 2020, 11:54 AM

douardda added inline comments to D2777: journal.replay: Batch insert contents/skipped_contents in storage backend.

Mar 6 2020, 11:54 AM

douardda created D2776: sql: do not attempt to create the plpgsql lang if already exists.

Mar 6 2020, 9:31 AM

Mar 4 2020

douardda accepted D2767: Add some tenacity to checking whether an object is in the destination.

Mar 4 2020, 5:35 PM

douardda created P603 (An Untitled Masterwork).

Mar 4 2020, 3:20 PM

douardda created P602 (An Untitled Masterwork).

Mar 4 2020, 2:04 PM

Mar 3 2020

douardda committed rCDFPcc2ae5af9877: images/base: add support for the LOG_LEVEL env var for replayer services (authored by douardda).

images/base: add support for the LOG_LEVEL env var for replayer services

Mar 3 2020, 10:53 AM

douardda committed rCDFPb730b619299c: Update a bit the README file (authored by douardda).

Update a bit the README file

Mar 3 2020, 10:53 AM

douardda committed rCDFP05dde3bf616d: grafana: fix the datasource config (authored by douardda).

grafana: fix the datasource config

Mar 3 2020, 10:53 AM

douardda committed rCDFP6748a2080ca2: grafana: add a backend statistics dashboard, tune a bit the graph replayer one (authored by douardda).

grafana: add a backend statistics dashboard, tune a bit the graph replayer one

Mar 3 2020, 10:53 AM

douardda committed rCDFPa7d896f05aa2: Move nginx listening port to 5081 (authored by douardda).

Move nginx listening port to 5081

Mar 3 2020, 10:53 AM

douardda committed rCDFP9ecc8aa09974: update images entrypoint files (authored by douardda).

update images entrypoint files

Mar 3 2020, 10:53 AM

douardda committed rCDFPefd4b4496e46: images/web: use a better 'shell' CMD support in web's entrypoint (authored by douardda).

images/web: use a better 'shell' CMD support in web's entrypoint

Mar 3 2020, 10:53 AM

douardda committed rCDFP698e861a7056: example: fix the content-replayer.yml.example file (authored by douardda).

example: fix the content-replayer.yml.example file

Mar 3 2020, 10:53 AM

douardda committed rCDFPd4c658bf1a6a: mirror: update the mirror deployment compose file (authored by douardda).

mirror: update the mirror deployment compose file

Mar 3 2020, 10:53 AM

douardda committed rCDFP9fd8cdc38af7: images/web: reduce swh-web image size (authored by douardda).

images/web: reduce swh-web image size

Mar 3 2020, 10:53 AM

douardda committed rCDFP59ae8d7374b0: postgres: improve a bit the Postgresql configuration (authored by douardda).

postgres: improve a bit the Postgresql configuration

Mar 3 2020, 10:53 AM

douardda committed rCDFP390be1b78a7a: Dockerfile: update to buster and add the pgsql.sh utils file (authored by douardda).

Dockerfile: update to buster and add the pgsql.sh utils file

Mar 3 2020, 10:53 AM

douardda committed rCDFPd9078f56c6ed: Add prometheus, statsd and grafana services (authored by douardda).

Add prometheus, statsd and grafana services

Mar 3 2020, 10:53 AM

douardda committed rCDFPa0021f7e1bbb: web: add missing config entries (authored by douardda).

web: add missing config entries

Mar 3 2020, 10:53 AM

douardda committed rCDFP524e7ee6d410: Add a pre-commit config file (authored by douardda).

Add a pre-commit config file

Mar 3 2020, 10:53 AM

douardda committed rCDFP1e603cd4fda5: README: update the README file (authored by douardda).

README: update the README file

Mar 3 2020, 10:53 AM

douardda added a comment to D2751: Add support for the static consumer group feature to journal client.

This is nice.

Mar 3 2020, 9:48 AM

Feb 17 2020

douardda created D2680: Add a paragraph in the README file about installing azure-cli from pip.

Feb 17 2020, 12:57 PM

Feb 12 2020

douardda requested changes to D2651: JournalClient: split main loop in three functions.

You should give a hint in your commit message on why you do this refactoring.

Feb 12 2020, 9:55 AM

Feb 6 2020

douardda requested changes to D2614: scheduler.backend_es: Leave index opened when streaming bulk.

okay-ish but lifecycle of ES related services/objects is unclear to me.

Feb 6 2020, 11:11 AM

douardda requested changes to D2619: in-memory storage: compute all counters.

Thanks for the contribution.
You must however ensure tests pass ok before we can accept it. Note that the tests you modify (in tests/test_storage.py) are executed by all the storage backends (postgres, cassandra and the in_memory one you really are targeting here). So make sure they are still OK with all the backends.

Feb 6 2020, 10:57 AM

Feb 3 2020

douardda added inline comments to D2614: scheduler.backend_es: Leave index opened when streaming bulk.

Feb 3 2020, 11:54 AM

Jan 31 2020

douardda accepted D2566: Add Cassandra backend..

Looks good to me, but it would really be nice to have a bit more documentation/explanation on how stuff work and are organized in Cassandra, be it in the code itself and as docu material in doc/

Jan 31 2020, 2:09 PM

Jan 29 2020

douardda committed rDMOD57a0e08925d4: cli: add support for reading a file content from stdin in 'swh identify' command (authored by douardda).

cli: add support for reading a file content from stdin in 'swh identify' command

Jan 29 2020, 3:49 PM

douardda closed D2599: cli: add support for reading a file content from stdin in 'swh identify' command.

Jan 29 2020, 3:49 PM

douardda updated the diff for D2599: cli: add support for reading a file content from stdin in 'swh identify' command.

typos

Jan 29 2020, 3:23 PM

douardda added inline comments to D2599: cli: add support for reading a file content from stdin in 'swh identify' command.

Jan 29 2020, 3:22 PM

douardda created D2599: cli: add support for reading a file content from stdin in 'swh identify' command.

Jan 29 2020, 2:57 PM

douardda added a comment to T2003: Content replayer may try to copy objects before they are available from an objstorage.

In T2003#41459, @vlorentz wrote:

In T2003#41457, @douardda wrote:

One question could be 'what is the definitive source of truth in our stack?'

I assumed we wanted to aim for Kafka to be the source of truth

Jan 29 2020, 2:00 PM · Journal

douardda added a comment to T2003: Content replayer may try to copy objects before they are available from an objstorage.

In T2003#41456, @olasd wrote:

Now that I think of it, we can decompose this in stages in the storage pipeline:

add an input validating proxy high up the stack

replace the journal writer calls sprinkled in all methods with a journal writing proxy

add a "don't insert objects" filter low down the stack

so we'd end up with the following pipeline for workers:

input validation proxy

object bundling proxy

object deduplication against read-only proxy

journal writer proxy

addition-blocking filter

underlying read-only storage

and the following pipeline for the "main storage replayer":

underlying read-write storage

(it's a very short pipeline... a pipedash?)

Jan 29 2020, 11:45 AM · Journal

douardda added a comment to T2003: Content replayer may try to copy objects before they are available from an objstorage.

In T2003#41443, @vlorentz wrote:

We already discussed this at the time we replaced the journal-publisher with journal-writer. Adding to Kafka after inserting to the DB means that Kafka will be missing some messages, and we would need to run a backfiller on a regular basis to fix it.

Jan 29 2020, 11:40 AM · Journal

Jan 28 2020

douardda added inline comments to D2582: Web API endpoint /known/.

Jan 28 2020, 12:12 PM

douardda added a comment to T2003: Content replayer may try to copy objects before they are available from an objstorage.

In T2003#41428, @olasd wrote:

This component would centralize the "has this object already appeared?" logic, as well as the queueing+retry logic, and would replace the current kafka mirror component.

How does that sound?

Jan 28 2020, 9:37 AM · Journal

douardda added a comment to T2003: Content replayer may try to copy objects before they are available from an objstorage.

In T2003#41429, @olasd wrote:

Key metrics for the filter component:

kafka consumer offset

min(latest_attempt) where in_flight = true (time it takes for a message from submission in the buffer to (re-)processing by the filter; should stay close to the current time)

count(*) where given_up = false group by topic (number of objects pending a retry, should be small)

count(*) where in_flight = true group by topic (number of objects buffered for reprocessing, should be small)

max(latest_attempt) (last processing time by the requeuing process)

count(*) where given_up = true (checks whether the housekeeping process)