Ah fun, one of the revisions with this pb, on staging (ba3343bc4fa403a8dfbfcab7fc1a8c29ee34bd69) seems to have been crafted by https://gitlab.com/gitlab-org/gitlab-foss/-/blob/staging-26-fix_add_deploy_key_spec/spec/models/merge_request_diff_commit_spec.rb
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Apr 22 2021
Apr 21 2021
See T3170 (error generated by the same invalid kafka messages).
Apr 19 2021
Apr 8 2021
Just got this one below. Note that this occurred just when the replayer actually started to insert object in the storage (before that, since the start of the replayer process, only kafka scaffolding took place for quite some time, around 30mn!)
Apr 6 2021
Apr 2 2021
Currently, the mirror test session is running with:
easy fix: modify the replayer to ignore this 'metadata' column while inserting revisions
09:45 <+vlorentz> douardda: yes and the only way around it (short of dropping data) is T3089 09:46 -swhbot:#swh-devel- T3089 (submitter: vlorentz, owner: vlorentz, status: Open): Remove the 'metadata' column of the 'revision' table <https://forge.softwareheritage.org/T3089> 09:46 <+vlorentz> or switching to cassandra 09:46 <+vlorentz> the good news is, they couldn't be inserted in the storage either, so you can safely drop them for now
Mar 15 2021
Mar 11 2021
Mar 4 2021
Dec 23 2020
Nov 17 2020
The new cluster in rocquencourt is using the built-in Kafka ACLs now (9993a81ffc7a1c8bd519b33ae63ac1145105f624).
Oct 16 2020
Same as before but with 1M (fresh) sha1s:
Since the results on uffizi above did suffer from a few caveats, I've made a few more tests:
- a first result has been obtained with a dataset that had only objects stored on the XFS part of the objstorage
- a second dataset has been created (with the order by sha256 part to spread the sha1s)
- but results are a mix hot/cold cache tests
Oct 15 2020
Some results:
Sep 22 2020
(the backfill had, in fact, completed within a month)
At this point, I don't think we'll make it much better with postgres as source.
Apr 28 2020
The kafka producer in swh.journal now reads message receipts and fails if they're negative, or if they didn't arrive within two minutes.
Apr 15 2020
Apr 6 2020
Feb 10 2020
c = swh.journal.client.JournalClient(**{ 'group_id': 'olasd-test-sasl-1', 'brokers': ['kafka%02d.euwest.azure.softwareheritage.org:9093' % i for i in range(1,7)], 'security.protocol': 'SASL_SSL', 'sasl.mechanisms': 'SCRAM-SHA-512', 'sasl.username': '<username>', 'sasl.password': '<password>', 'debug': 'consumer', })
(yes, passing dotted config parameters in kwargs is... not the cleanest)
Feb 7 2020
Following documentation with the following links:
- https://kafka.apache.org/documentation/#security
- https://rmoff.net/2018/08/02/kafka-listeners-explained/
- https://docs.confluent.io/current/kafka/encryption.html#ssl-overview
- sort of https://medium.com/code-tech/kafka-in-aws-with-ssl-offloading-using-load-balancer-c337da1435c3
- kinda https://www.slideshare.net/ConfluentInc/how-to-lock-down-apache-kafka-and-keep-your-streams-safe
Jan 23 2020
We've now hit T2003 hard as the client caught up with the head of the local kafka cluster. That's why the curve is flattening out currently, as I stopped the replayers until the queue is implemented.
Jan 22 2020
Dec 7 2019
We'll need to address T2003 before this can be closed (if we go the journal client route), so marking accordingly.
I've launched 16 content backfillers in parallel for each hex digit prefix which should help with this.
I don't think we're going to do this but rather use the journal client approach. (Even more so considering that writing to S3 takes 500ms for each object, which sounds like a silly artificial limit to put on a synchronous process).
Aug 26 2019
The content topic has fully replicated to the new cluster over the weekend.
Aug 23 2019
A new Kafka cluster has been spun up on azure virtual machines, with 6 machines each with 8TB of storage available.
Jul 14 2019
Jul 9 2019
Jun 28 2019
1 month is good enough. Let's stick to this.
Jun 25 2019
With 16 processes in parallel still, adding more CPUs gives an ETA of ~1 month, which stays pretty bad.
Running the directory backfiller (single instance) against belvedere yields an ETA of 250 days, which is around a 3x speedup from somerset.
have we now any insight on the behavior of the backfiller against belvedere?
I'm enclined to prefer option 2, since performance is an issue we cannot underestimate...