Do you think a run with cassandra is necessary to evaluate a potential performance impact?
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Aug 6 2021
as scylla is coming with its own prometheus node exporter (and is removing the default packages :()
The db server prometheus configuration needs some adaptation as scylla is coming with its own prometheus node exporter (and is removing the default packages :()
root@parasilo-2:/opt# apt install scylla-node-exporter Reading package lists... Done Building dependency tree Reading state information... Done The following packages were automatically installed and are no longer required: libio-pty-perl libipc-run-perl moreutils Use 'apt autoremove' to remove them. The following packages will be REMOVED: prometheus-node-exporter The following NEW packages will be installed: scylla-node-exporter 0 upgraded, 1 newly installed, 1 to remove and 7 not upgraded. Need to get 0 B/4,076 kB of archives. After this operation, 3,243 kB of additional disk space will be used.
there is also a lot of error on the scylla logs relative to read timeout (with no activities on the database except the monitoring):
Aug 06 14:52:10 parasilo-4.rennes.grid5000.fr scylla[16488]: [shard 5] storage_proxy - Exception when communicating with 172.16.97.4, to read from swh.object_count: seastar::named_semaphore_timed_out (Semaphore timed out: _read_concurrency_sem) Aug 06 14:52:10 parasilo-4.rennes.grid5000.fr scylla[16488]: [shard 6] storage_proxy - Exception when communicating with 172.16.97.4, to read from swh.object_count: seastar::named_semaphore_timed_out (Semaphore timed out: _read_concurrency_sem)
After having some hard time to configure and correctly start the scylla servers (different binding, configuration adaptation), the schema was correctly created (I needed to add SWH_USE_SCYLLADB=1 on the initialisation script).
Compared to cassandra, it seems the nodetool command didn't return correctly the data repartition on the cluster because the system keyspaces hasn't the same replication factor as the swh one
vsellier@parasilo-2:~$ nodetool status Using /etc/scylla/scylla.yaml as the config file Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 172.16.97.2 2.36 MB 256 ? 866bbcc4-d496-4ebb-ab3b-12ef4942beaa rack1 UN 172.16.97.3 3.37 MB 256 ? 21fdd0a9-15cd-473f-814c-c8ac24870aca rack1 UN 172.16.97.4 3.48 MB 256 ? 1ed61715-01a0-4c15-a4bc-f9972f575437 rack1
scylladb test
run7 results - cassandra heap from 16g to 32g
run6 results - commitlog on a HDD
Aug 5 2021
Some news about the tests running since the beginning of the week:
- The data retention of the federated prometheus had the default value so all the data has expired after 15 days. A new reference run was performed to be able to compare with the default scenario
- The first try failed because it was the first time there were adaption on the zfs configuration and it was not correctly deploy via the ansible scripts. It was solved by completely cleaning up the zfs configuration and relaunching the deployment. Unfortunately, it needs to be manually launched before launching a test with zfs changes.
- With the usage of the best effort jobs, it's possible to perform test during the days without exceeding the quota
Aug 2 2021
This should be resolved now (actually, on the 31st at 16).
Jul 30 2021
Thanks for the heads up @ both of you.
Jul 29 2021
Deployed \o/
Closing.
It's working but the check does not pass green [1]. As far as i could tell, the unsuccessful
event [1] is seen as failure by the check.
At the end of it all though, the final production check end-to-end for mercurial origin should go green.
From the staging webapp, we identify a revision [1]
by the way ^
Shipped the following modules to solve that problem:
- swh.model v2.7.0
- swh.storage v0.35.0
- swh.loader.mercurial v2.1.0
Jul 28 2021
Jul 23 2021
For now I went the simplest way I could think of, which is:
Jul 9 2021
The run with 8 nodes was faster[1] than the previous ones, as expected, but it seems it could have been even faster because the bottleneck are now the 6 replayers which have a really high load.
The performance is better between 60% to 100% depending of the object.
The version was used during the last tests on grid5000. The consistency level was correctly configured
Jul 8 2021
News for the last incomplete run with 4 nodes[1], it seems it's 25% faster than with 3 nodes which it's great
The next run will be this night with 8 cassandra nodes
During the last run, I discovered there were cassandra logs[1] about oversized mutations on this run and all the previous ones.
It means some changes were committed but ignored when the commit log is flushed which it's absolutely wrong.
Jul 7 2021
released in swh-storage:v0.34.0
A run was launched with th patch storage allowing to configure the consistency levels.
The results are on the dedicated hedgedoc document[1]
Jul 6 2021
The problem was solved with an increase of the default timeout in the cassandra configuration and by reducing the journal_client batch size.
A replay was run during during 13 hours the previous night with the current default consistency=ONE. It can be used as the reference for the next test with the LOCAL_QUORUM consistency.
After trying several options to render the result, the simpler was to export the content of a spreadsheet in a hedgedoc document [1] .
The data are stored in a prometheus instance on a proxmox vms so it will always possible to improve the statistics later[2].
Jul 1 2021
A new run was launched with 4x10 replayers per object type.
A limit is still reached for the revision replayers which ends with read timeout.
A test with only the revision replayers shows the problems start to append when the number of // replayers is greater than ~20
One example of the wrong behavior of the read consistencty level ONE for read requests:
One of the probe of the monitoring query is based on the objectcount table content.
The servers were hard stopped after the end of the grid5000 resrvation and it seems some replication messages were lost. After the restart the content of the table in not in sync on all the servers.
(3) should be ideally implemented in a way that guarantees that extid that were resolvable in previous versions of the mapping will always be resolvable in future versions
I don't understand. Option 3 is to remove relations between extids and SWHID, so it won't be resolvable anymore.
Jun 30 2021
So if that mapping change, but always give back an object in the archive (pointed by a SWHID)
I've the feeling that option (1) will lead in the long run to an explosion on the size of the mapping which will make us eventually converge (slowly) toward option (3).
and it would probably be kind of a mess from a kafka perspective
The "mapping version field" is the most fleshed out proposal as it would be my preference. My rationale for it against changing extid_type for backwards incompatible changes is that the extid_type is a property of the external artifact, while the mapping version is a property of our archiving infrastructure.
A first run was launched with 4x20 replayered per object type.
It seems it's too much for 3 cassandra nodes. The default 1s timeout for cassandra reads is often reached.
It's seems the batch size of 1000 is also too much for some object types like snapshots
A new test will be launched this night with some changes:
- reduce the number of replayers processes to 4x10 per object type
- reduce the journal client batch size to 500
Jun 29 2021
These are the first tests that will be executed:
- baseline configuration : 3 cassandra nodes (parasilo[1]), commitlog on a dedicated SSD, data on 4 HDD. Goal: testing minimal configuration (a night or a complete weekend if possible)
- baseline configuration + 1 cassandra nodes. Goal: Testing the performance impact of having 1 more server (duration enough to have tendencies)
- baseline configuration + 3 cassandra nodes: Goal: Testing the performance impact of having the cluster size x 2 (duration enough to have tendencies)
- baseline configuration but with the commitlog on the data partition. Goal check the impact of data/commitlog mutualization (duration enough to have tendencies)
- baseline configuration but with 2 HDD. Goal check the impact of the number of disks + have a reference for the next run (a night)
- baseline configuration but with 2 HDD + commitlog on a dedicated HDD. Goal check the impact of having the commitlog on a slower disk (duration enough to have tendencies)
- baseline configuration but with 2x the default heap allocated to cassandra. Goal check the impact of the memory configuration ((!) check the gc profile)
Thanks, it was tested this night on grid5000, all the origins were correctly replayed without issues.
Jun 28 2021
In T3396#66905, @vlorentz wrote:only one confirmation of the write is needed
It's not perfect though. If the server that confirmed the writes breaks before it replicates the write, then the write is lost.
only one confirmation of the write is needed
IMO, we should first try to have a global configuration for all the read/write queries, and improve that later if needed for performance or if it creates some problems. At worst, it will be possible to use the default ONE values by configuration.
released in swh-storage:v0.32.0
You can try changing the constants in _execute_with_retries.