add missing mypy deps in requirements-test
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Mar 12 2020
add support for the 'validator' argument in attrib_typecheck
rebase + add missing plugin file
replace bhex() by _x() and other stuff reported by olasd
The annotation part should be done on the whole module and, most importantly, in a dedicated revision.
In D2771#66331, @vlorentz wrote:I really think we should either have it for all object types or none at all.
In D2814#67371, @ardumont wrote:What about adding tests on this or do you rely on BaseModel's?
Mar 11 2020
In D2803#67209, @olasd wrote:In D2803#67208, @douardda wrote:In D2803#67024, @olasd wrote:My main doubt was whether we stopped explicitly converting model objects to dicts altogether (going through the swh.core model serializer instead). But even in that case contents will still be deserializable (as Content.from_dict(d) still works even when d['data'] is None).
What swh.core model serializer do you refer to? The ones in swh.core.api?
Yes. And now that you've pointed it out, I've remembered that it's the swh.storage RPC layer that adds a hook to support model objects.
In D2803#67024, @olasd wrote:My main doubt was whether we stopped explicitly converting model objects to dicts altogether (going through the swh.core model serializer instead). But even in that case contents will still be deserializable (as Content.from_dict(d) still works even when d['data'] is None).
Mar 10 2020
remove extra parameter 'anon' mistakenly included in the diff
Mar 6 2020
In D2776#66377, @olasd wrote:This looks sound but the tests are hanging on the initialization of the postgresql database now... (at least on jenkins)
ok (besides my remark).
Mar 4 2020
Mar 3 2020
This is nice.
Feb 17 2020
Feb 12 2020
You should give a hint in your commit message on why you do this refactoring.
Feb 6 2020
okay-ish but lifecycle of ES related services/objects is unclear to me.
Thanks for the contribution.
You must however ensure tests pass ok before we can accept it. Note that the tests you modify (in tests/test_storage.py) are executed by all the storage backends (postgres, cassandra and the in_memory one you really are targeting here). So make sure they are still OK with all the backends.
Feb 3 2020
Jan 31 2020
Looks good to me, but it would really be nice to have a bit more documentation/explanation on how stuff work and are organized in Cassandra, be it in the code itself and as docu material in doc/
Jan 29 2020
typos
In T2003#41459, @vlorentz wrote:In T2003#41457, @douardda wrote:One question could be 'what is the definitive source of truth in our stack?'
I assumed we wanted to aim for Kafka to be the source of truth
In T2003#41456, @olasd wrote:Now that I think of it, we can decompose this in stages in the storage pipeline:
- add an input validating proxy high up the stack
- replace the journal writer calls sprinkled in all methods with a journal writing proxy
- add a "don't insert objects" filter low down the stack
so we'd end up with the following pipeline for workers:
- input validation proxy
- object bundling proxy
- object deduplication against read-only proxy
- journal writer proxy
- addition-blocking filter
- underlying read-only storage
and the following pipeline for the "main storage replayer":
- underlying read-write storage
(it's a very short pipeline... a pipedash?)
In T2003#41443, @vlorentz wrote:We already discussed this at the time we replaced the journal-publisher with journal-writer. Adding to Kafka after inserting to the DB means that Kafka will be missing some messages, and we would need to run a backfiller on a regular basis to fix it.
Jan 28 2020
In T2003#41428, @olasd wrote:This component would centralize the "has this object already appeared?" logic, as well as the queueing+retry logic, and would replace the current kafka mirror component.
How does that sound?
In T2003#41429, @olasd wrote:Key metrics for the filter component:
- kafka consumer offset
- min(latest_attempt) where in_flight = true (time it takes for a message from submission in the buffer to (re-)processing by the filter; should stay close to the current time)
- count(*) where given_up = false group by topic (number of objects pending a retry, should be small)
- count(*) where in_flight = true group by topic (number of objects buffered for reprocessing, should be small)
- max(latest_attempt) (last processing time by the requeuing process)
- count(*) where given_up = true (checks whether the housekeeping process)
Note: haven't read the other comment below, just reacting at this one as I am reading it.
Jan 23 2020
Is this still "a thing"?
Since T1914 is high priority, this one is too.
What is the status of this issue? Do we still face this bug?