Landed.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
All Stories
Jan 28 2020
Move the cassandra storage's dependencies to a profile
accepted, will land it soon.
Now that I think of it, we can decompose this in stages in the storage pipeline:
In T2003#41443, @vlorentz wrote:@olasd I'm worried that implementing your idea would result in some complex piece of code.
you're making the cassandra driver in the same process as the webapp. Are you sure you want to do that?
In D2594#61676, @ardumont wrote:Fix crash of Cassandra on Java >=9
ugh
i checked and we do use the jvm 8 on the cassandra nodes.
Still, sounds like migrating to cassandra will raise quite the issues to deploy.
fix commit message
Fix crash of Cassandra on Java >=9
you're making the cassandra driver in the same process as the webapp. Are you sure you want to do that?
Fix python3-cassandra package name
- Properly switch to webapp0's storage to cassandra storage
- Add conditional to install the python3-cassandra when needed
We should also make sure that the optional cassandra dependencies are pulled in by the storage deployment (unless swh.storage grew hardcoded dependencies on the cassandra stuff, in which case we're fine).
Build is green
See https://jenkins.softwareheritage.org/job/DCORE/job/tox/398/ for more details.
fix wrapped function name
Build was aborted
Build was aborted
Build is green
See https://jenkins.softwareheritage.org/job/DWAPPS/job/cypress-diff/514/ for more details.
Build is green
See https://jenkins.softwareheritage.org/job/DWAPPS/job/tox/906/ for more details.
Build has FAILED
Build has FAILED
My bad, you don't need that
You should replace "intrinsic_metadata" with "extrinsic_metadata". But other than that, ok for me
Build is green
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/319/ for more details.
Add tests on both loaders
@olasd I'm worried that implementing your idea would result in some complex piece of code. It also adds a new postgresql database and new kafka topics, that will need extra resources and management. And if at some point that queue database becomes too large, the retrier will become slower, causing the queue to grow even more.
For some unknown reasons yet, i don't see the python-cassandra-driver (buster) uploaded in pergamon.
Don't know if it's related or not, It's marked as source, i need to continue the work on that [2]
Then you don't need if request.method == 'POST':, right?
Switch to cassandra storage only for webapp0
We should also make sure that the optional cassandra dependencies are pulled in by the storage deployment (unless swh.storage grew hardcoded dependencies on the cassandra stuff, in which case we're fine).
I also wanted to make sure i did not misconfigure. Nobody shouted at it so it seemed fine from that standpoint ;)
In T2003#41428, @olasd wrote:This component would centralize the "has this object already appeared?" logic, as well as the queueing+retry logic, and would replace the current kafka mirror component.
How does that sound?
In T2003#41429, @olasd wrote:Key metrics for the filter component:
- kafka consumer offset
- min(latest_attempt) where in_flight = true (time it takes for a message from submission in the buffer to (re-)processing by the filter; should stay close to the current time)
- count(*) where given_up = false group by topic (number of objects pending a retry, should be small)
- count(*) where in_flight = true group by topic (number of objects buffered for reprocessing, should be small)
- max(latest_attempt) (last processing time by the requeuing process)
- count(*) where given_up = true (checks whether the housekeeping process)
Note: haven't read the other comment below, just reacting at this one as I am reading it.
Jan 27 2020
In D2582#61590, @vlorentz wrote:I think you missed this comment: https://forge.softwareheritage.org/D2582?id=9215#inline-17178
I think you missed this comment: https://forge.softwareheritage.org/D2582?id=9215#inline-17178
Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/907/ for more details.
Build has FAILED
I agree with @olasd. Another possibility is to run an swh-storage instance on webapp0.
Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/905/ for more details.
apply comment on content_add.
timeout in wait_for_peer.
I believe the vault and the indexers also use storage0.euwest.azure as read-only archive backend. We don't want to switch them over to cassandra (at least not until the replay has completed, if ever).
As for implementing the queue / retry behavior in the filter component:
Build is green
See https://jenkins.softwareheritage.org/job/DWAPPS/job/cypress-diff/512/ for more details.
Build is green
See https://jenkins.softwareheritage.org/job/DWAPPS/job/cypress-diff/511/ for more details.
So, now that T1914 is stuck, I'm giving this a harder think, and I'm wondering whether we shouldn't have a generic buffering/filtering component in the journal instead:
Build was aborted
Build is green
See https://jenkins.softwareheritage.org/job/DWAPPS/job/cypress-diff/510/ for more details.