Query: Advanced Search

images/web: use a better 'shell' CMD support in web's entrypoint

example: fix the content-replayer.yml.example file

mirror: update the mirror deployment compose file

images/web: reduce swh-web image size

postgres: improve a bit the Postgresql configuration

Dockerfile: update to buster and add the pgsql.sh utils file

Add prometheus, statsd and grafana services

web: add missing config entries

Add a pre-commit config file

README: update the README file

	Include stories about projects I am a member of.

You should give a hint in your commit message on why you do this refactoring.

okay-ish but lifecycle of ES related services/objects is unclear to me.

Thanks for the contribution.
You must however ensure tests pass ok before we can accept it. Note that the tests you modify (in tests/test_storage.py) are executed by all the storage backends (postgres, cassandra and the in_memory one you really are targeting here). So make sure they are still OK with all the backends.

Looks good to me, but it would really be nice to have a bit more documentation/explanation on how stuff work and are organized in Cassandra, be it in the code itself and as docu material in doc/

cli: add support for reading a file content from stdin in 'swh identify' command

In T2003#41459, @vlorentz wrote:

In T2003#41457, @douardda wrote:

One question could be 'what is the definitive source of truth in our stack?'

I assumed we wanted to aim for Kafka to be the source of truth

In T2003#41456, @olasd wrote:

Now that I think of it, we can decompose this in stages in the storage pipeline:

add an input validating proxy high up the stack

replace the journal writer calls sprinkled in all methods with a journal writing proxy

add a "don't insert objects" filter low down the stack

so we'd end up with the following pipeline for workers:

input validation proxy

object bundling proxy

object deduplication against read-only proxy

journal writer proxy

addition-blocking filter

underlying read-only storage

and the following pipeline for the "main storage replayer":

underlying read-write storage

(it's a very short pipeline... a pipedash?)

In T2003#41443, @vlorentz wrote:

We already discussed this at the time we replaced the journal-publisher with journal-writer. Adding to Kafka after inserting to the DB means that Kafka will be missing some messages, and we would need to run a backfiller on a regular basis to fix it.

In T2003#41428, @olasd wrote:

This component would centralize the "has this object already appeared?" logic, as well as the queueing+retry logic, and would replace the current kafka mirror component.

How does that sound?

In T2003#41429, @olasd wrote:

Key metrics for the filter component:

kafka consumer offset

min(latest_attempt) where in_flight = true (time it takes for a message from submission in the buffer to (re-)processing by the filter; should stay close to the current time)

count(*) where given_up = false group by topic (number of objects pending a retry, should be small)

count(*) where in_flight = true group by topic (number of objects buffered for reprocessing, should be small)

max(latest_attempt) (last processing time by the requeuing process)

count(*) where given_up = true (checks whether the housekeeping process)

Note: haven't read the other comment below, just reacting at this one as I am reading it.

Is this still "a thing"?

Since T1914 is high priority, this one is too.

What is the status of this issue? Do we still face this bug?

Agreed, this no longer need to be a high priority task.

docker/test: add a pytest based test for the vault stack

docker/test: add a pytest based test for the git loading stack

Fix typos and address ardumont's comments

In D2552#60770, @vlorentz wrote:

Why does setup_pip call scheduler_host.check_output()?

docker/tox: install pdbpp in py3 environment

Advanced Search
Use Results
Edit Query
Hide Query

Mar 3 2020

Feb 17 2020

Feb 12 2020

Feb 6 2020

Feb 3 2020

Jan 31 2020

Jan 29 2020

Jan 28 2020

Jan 23 2020

Jan 20 2020

Jan 17 2020

Advanced SearchUse ResultsEdit QueryHide Query