The run with 8 nodes was faster[1] than the previous ones, as expected, but it seems it could have been even faster because the bottleneck are now the 6 replayers which have a really high load.
The performance is better between 60% to 100% depending of the object.

Jul 9 2021, 9:59 AM · System administration, Storage manager

vsellier closed T3396: cassandra - allow to configure the consistency level used by the queries, a subtask of T3357: Perform some tests of the cassandra storage on Grid5000, as Resolved.

Jul 9 2021, 9:46 AM · System administration, Storage manager

vsellier closed T3396: cassandra - allow to configure the consistency level used by the queries as Resolved.

The version was used during the last tests on grid5000. The consistency level was correctly configured

Jul 9 2021, 9:46 AM · System administration, Storage manager

Jul 8 2021

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

News for the last incomplete run with 4 nodes[1], it seems it's 25% faster than with 3 nodes which it's great
The next run will be this night with 8 cassandra nodes

Jul 8 2021, 6:29 PM · System administration, Storage manager

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

During the last run, I discovered there were cassandra logs[1] about oversized mutations on this run and all the previous ones.
It means some changes were committed but ignored when the commit log is flushed which it's absolutely wrong.

Jul 8 2021, 1:54 PM · System administration, Storage manager

Jul 7 2021

vsellier added a comment to T3396: cassandra - allow to configure the consistency level used by the queries.

released in swh-storage:v0.34.0

Jul 7 2021, 6:46 PM · System administration, Storage manager

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

A run was launched with th patch storage allowing to configure the consistency levels.
The results are on the dedicated hedgedoc document[1]

Jul 7 2021, 11:21 AM · System administration, Storage manager

Jul 6 2021

vsellier added a revision to T3396: cassandra - allow to configure the consistency level used by the queries: D5974: cassandra: Allow to configure the consistency level to use.

Jul 6 2021, 4:59 PM · System administration, Storage manager

vsellier closed T3395: cassandra - Timeouts during revision import, a subtask of T3357: Perform some tests of the cassandra storage on Grid5000, as Resolved.

Jul 6 2021, 12:44 PM · System administration, Storage manager

vsellier closed T3395: cassandra - Timeouts during revision import as Resolved.

The problem was solved with an increase of the default timeout in the cassandra configuration and by reducing the journal_client batch size.

Jul 6 2021, 12:44 PM · System administration, Storage manager

vsellier changed the status of T3396: cassandra - allow to configure the consistency level used by the queries from Open to Work in Progress.

Jul 6 2021, 12:12 PM · System administration, Storage manager

vsellier changed the status of T3396: cassandra - allow to configure the consistency level used by the queries, a subtask of T3357: Perform some tests of the cassandra storage on Grid5000, from Open to Work in Progress.

Jul 6 2021, 12:12 PM · System administration, Storage manager

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

A replay was run during during 13 hours the previous night with the current default consistency=ONE. It can be used as the reference for the next test with the LOCAL_QUORUM consistency.
After trying several options to render the result, the simpler was to export the content of a spreadsheet in a hedgedoc document [1] .
The data are stored in a prometheus instance on a proxmox vms so it will always possible to improve the statistics later[2].

Jul 6 2021, 11:55 AM · System administration, Storage manager

Jul 1 2021

vsellier updated the task description for T3357: Perform some tests of the cassandra storage on Grid5000.

Jul 1 2021, 6:33 PM · System administration, Storage manager

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

A new run was launched with 4x10 replayers per object type.
A limit is still reached for the revision replayers which ends with read timeout.
A test with only the revision replayers shows the problems start to append when the number of // replayers is greater than ~20

Jul 1 2021, 1:07 PM · System administration, Storage manager

vsellier added a comment to T3396: cassandra - allow to configure the consistency level used by the queries.

One example of the wrong behavior of the read consistencty level ONE for read requests:
One of the probe of the monitoring query is based on the objectcount table content.
The servers were hard stopped after the end of the grid5000 resrvation and it seems some replication messages were lost. After the restart the content of the table in not in sync on all the servers.

Jul 1 2021, 11:12 AM · System administration, Storage manager

zack added a comment to T3418: Decide a consistent policy on having multiple archived objects for the same extid.

(3) should be ideally implemented in a way that guarantees that extid that were resolvable in previous versions of the mapping will always be resolvable in future versions

I don't understand. Option 3 is to remove relations between extids and SWHID, so it won't be resolvable anymore.

Jul 1 2021, 9:01 AM · Storage manager, Mercurial loader

Jun 30 2021

vlorentz added a comment to T3418: Decide a consistent policy on having multiple archived objects for the same extid.

So if that mapping change, but always give back an object in the archive (pointed by a SWHID)

Jun 30 2021, 7:58 PM · Storage manager, Mercurial loader

zack added a comment to T3418: Decide a consistent policy on having multiple archived objects for the same extid.

I've the feeling that option (1) will lead in the long run to an explosion on the size of the mapping which will make us eventually converge (slowly) toward option (3).

Jun 30 2021, 7:33 PM · Storage manager, Mercurial loader

vlorentz added a comment to T3418: Decide a consistent policy on having multiple archived objects for the same extid.

and it would probably be kind of a mess from a kafka perspective

Jun 30 2021, 6:59 PM · Storage manager, Mercurial loader

olasd added a comment to T3418: Decide a consistent policy on having multiple archived objects for the same extid.

The "mapping version field" is the most fleshed out proposal as it would be my preference. My rationale for it against changing extid_type for backwards incompatible changes is that the extid_type is a property of the external artifact, while the mapping version is a property of our archiving infrastructure.

Jun 30 2021, 6:55 PM · Storage manager, Mercurial loader

vlorentz updated the task description for T3418: Decide a consistent policy on having multiple archived objects for the same extid.

Jun 30 2021, 6:55 PM · Storage manager, Mercurial loader

olasd triaged T3418: Decide a consistent policy on having multiple archived objects for the same extid as Unbreak Now! priority.

Jun 30 2021, 6:49 PM · Storage manager, Mercurial loader

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

A first run was launched with 4x20 replayered per object type.
It seems it's too much for 3 cassandra nodes. The default 1s timeout for cassandra reads is often reached.
It's seems the batch size of 1000 is also too much for some object types like snapshots
A new test will be launched this night with some changes:

reduce the number of replayers processes to 4x10 per object type
reduce the journal client batch size to 500

Jun 30 2021, 3:07 PM · System administration, Storage manager

Jun 29 2021

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

These are the first tests that will be executed:

baseline configuration : 3 cassandra nodes (parasilo[1]), commitlog on a dedicated SSD, data on 4 HDD. Goal: testing minimal configuration (a night or a complete weekend if possible)
baseline configuration + 1 cassandra nodes. Goal: Testing the performance impact of having 1 more server (duration enough to have tendencies)
baseline configuration + 3 cassandra nodes: Goal: Testing the performance impact of having the cluster size x 2 (duration enough to have tendencies)
baseline configuration but with the commitlog on the data partition. Goal check the impact of data/commitlog mutualization (duration enough to have tendencies)
baseline configuration but with 2 HDD. Goal check the impact of the number of disks + have a reference for the next run (a night)
baseline configuration but with 2 HDD + commitlog on a dedicated HDD. Goal check the impact of having the commitlog on a slower disk (duration enough to have tendencies)
baseline configuration but with 2x the default heap allocated to cassandra. Goal check the impact of the memory configuration ((!) check the gc profile)

Jun 29 2021, 12:59 PM · System administration, Storage manager

vlorentz removed a project from T3357: Perform some tests of the cassandra storage on Grid5000: Roadmap 2021.

Jun 29 2021, 10:51 AM · System administration, Storage manager

vsellier added a comment to T3394: cassandra - origin url hashing encoding issue.

Thanks, it was tested this night on grid5000, all the origins were correctly replayed without issues.

Jun 29 2021, 10:36 AM · System administration, Storage manager

Jun 28 2021

vsellier updated the task description for T3357: Perform some tests of the cassandra storage on Grid5000.

Jun 28 2021, 8:49 PM · System administration, Storage manager

vsellier added a comment to T3396: cassandra - allow to configure the consistency level used by the queries.

In T3396#66905, @vlorentz wrote:

only one confirmation of the write is needed

It's not perfect though. If the server that confirmed the writes breaks before it replicates the write, then the write is lost.

Jun 28 2021, 5:54 PM · System administration, Storage manager

vlorentz added a comment to T3396: cassandra - allow to configure the consistency level used by the queries.

only one confirmation of the write is needed

Jun 28 2021, 4:52 PM · System administration, Storage manager

vsellier added a comment to T3396: cassandra - allow to configure the consistency level used by the queries.

IMO, we should first try to have a global configuration for all the read/write queries, and improve that later if needed for performance or if it creates some problems. At worst, it will be possible to use the default ONE values by configuration.

Jun 28 2021, 4:48 PM · System administration, Storage manager

vsellier updated the task description for T3357: Perform some tests of the cassandra storage on Grid5000.

Jun 28 2021, 4:09 PM · System administration, Storage manager

vsellier updated the task description for T3357: Perform some tests of the cassandra storage on Grid5000.

Jun 28 2021, 4:09 PM · System administration, Storage manager

vsellier updated the task description for T3357: Perform some tests of the cassandra storage on Grid5000.

Jun 28 2021, 3:39 PM · System administration, Storage manager

vsellier added a comment to T3394: cassandra - origin url hashing encoding issue.

released in swh-storage:v0.32.0

Jun 28 2021, 3:37 PM · System administration, Storage manager

jayeshv updated the task description for T3413: Fix the inconsistency between snapshot_get_branches function in postgresql.storage and cassandra.storage.

Jun 28 2021, 2:52 PM · Storage manager

jayeshv triaged T3413: Fix the inconsistency between snapshot_get_branches function in postgresql.storage and cassandra.storage as Normal priority.

Jun 28 2021, 2:34 PM · Storage manager

vlorentz added a comment to T3395: cassandra - Timeouts during revision import.

You can try changing the constants in _execute_with_retries.

Jun 28 2021, 1:30 PM · System administration, Storage manager

vlorentz closed T3394: cassandra - origin url hashing encoding issue, a subtask of T3357: Perform some tests of the cassandra storage on Grid5000, as Resolved.

Jun 28 2021, 1:24 PM · System administration, Storage manager

vlorentz closed T3394: cassandra - origin url hashing encoding issue as Resolved.

Jun 28 2021, 1:24 PM · System administration, Storage manager

Jun 25 2021

vlorentz added a revision to T3394: cassandra - origin url hashing encoding issue: D5932: cassandra: Add support for non-ASCII origin 'URLs'..

Jun 25 2021, 5:27 PM · System administration, Storage manager

vlorentz added a revision to T3394: cassandra - origin url hashing encoding issue: D5931: hypothesis_strategies: generate non-ASCII IRIs for origin and authority 'urls'..

Jun 25 2021, 5:25 PM · System administration, Storage manager

vlorentz claimed T3394: cassandra - origin url hashing encoding issue.

Jun 25 2021, 4:51 PM · System administration, Storage manager

vlorentz removed a subtask for T2214: Scale-out graph and database storage in production: T3357: Perform some tests of the cassandra storage on Grid5000.

Jun 25 2021, 4:32 PM · meta-task, Roadmap 2022, Roadmap 2021, Storage manager

vlorentz removed a parent task for T3357: Perform some tests of the cassandra storage on Grid5000: T2214: Scale-out graph and database storage in production.

Jun 25 2021, 4:32 PM · System administration, Storage manager

vlorentz added a parent task for T3357: Perform some tests of the cassandra storage on Grid5000: T1892: Cassandra as a storage backend.

Jun 25 2021, 4:32 PM · System administration, Storage manager

vlorentz added a subtask for T1892: Cassandra as a storage backend: T3357: Perform some tests of the cassandra storage on Grid5000.

Jun 25 2021, 4:32 PM · meta-task, Storage manager

vlorentz added a comment to T3396: cassandra - allow to configure the consistency level used by the queries.

I'm not quite sure what this is about.

Jun 25 2021, 1:53 PM · System administration, Storage manager

Jun 22 2021

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

An array with the possible node count relative to the replication factor was added on the hedgedoc document : https://hedgedoc.softwareheritage.org/m2MBUViUQl2r9dwcq3-_Nw?both

Jun 22 2021, 9:47 AM · System administration, Storage manager

Jun 18 2021

vsellier renamed T3396: cassandra - allow to configure the consistency level used by the queries from cassandra - allow to configure the consitency level used by the queries to cassandra - allow to configure the consistency level used by the queries.

Jun 18 2021, 7:24 PM · System administration, Storage manager

vsellier renamed T3396: cassandra - allow to configure the consistency level used by the queries from cassandra - allow to configure the consitency level used for the queries to cassandra - allow to configure the consitency level used by the queries.

Jun 18 2021, 5:22 PM · System administration, Storage manager

vsellier updated subscribers of T3396: cassandra - allow to configure the consistency level used by the queries.

@vlorentz If you have an idea on how to implement that, I take it ;), I'm not sure if I have not missed something

Jun 18 2021, 5:22 PM · System administration, Storage manager

vsellier triaged T3396: cassandra - allow to configure the consistency level used by the queries as Normal priority.

Jun 18 2021, 5:19 PM · System administration, Storage manager

vsellier updated the task description for T3395: cassandra - Timeouts during revision import.

Jun 18 2021, 4:57 PM · System administration, Storage manager

vsellier triaged T3395: cassandra - Timeouts during revision import as Normal priority.

Jun 18 2021, 4:57 PM · System administration, Storage manager

vsellier updated the task description for T3394: cassandra - origin url hashing encoding issue.

Jun 18 2021, 4:49 PM · System administration, Storage manager

vsellier triaged T3394: cassandra - origin url hashing encoding issue as Normal priority.

Jun 18 2021, 4:49 PM · System administration, Storage manager

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

Several tests were executed with cassandra node on the parasilo cluster [1]
The configuration was always the same to calibrate the runs:

ZFS is used to manage to datasets
the commitlogs in the 200Go SSD drive
the data in the 4 600Gb HDD configured in RAID0
Default memory configuration (8Go / default GC (not g1))
Cassandra configuration : [2]

Jun 18 2021, 4:44 PM · System administration, Storage manager

Jun 16 2021

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

Some notes on how to perform common actions with cassandra: https://hedgedoc.softwareheritage.org/m2MBUViUQl2r9dwcq3-_Nw

Jun 16 2021, 11:09 AM · System administration, Storage manager

Jun 15 2021

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

The environment can be stopped and rebuild as long as the disk remained reserved on the servers.

Jun 15 2021, 10:50 AM · System administration, Storage manager

vsellier updated the task description for T3357: Perform some tests of the cassandra storage on Grid5000.

Jun 15 2021, 10:29 AM · System administration, Storage manager

vsellier updated the task description for T3357: Perform some tests of the cassandra storage on Grid5000.

Jun 15 2021, 10:29 AM · System administration, Storage manager

Jun 10 2021

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

Some status about the automation:

Cassandra nodes are ok (os installation, zfs configuration according to the defined environment except a problem during the first initialization with new disks, startup, cluster configuration)
swh-storage node is ok (os installation, gunicorn/swh-storage installation and startup)
cassandra database initialization :

root@parasilo-3:~#  nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load        Tokens  Owns (effective)  Host ID                               Rack 
UN  172.16.97.3  78.85 KiB   256     31.6%             49d46dd8-4640-45eb-9d4c-b6b16fc954ab  rack1
UN  172.16.97.5  105.45 KiB  256     26.0%             47e99bb4-4846-4e03-a06c-53ea2862172d  rack1
UN  172.16.97.4  98.35 KiB   256     18.1%             e2aeff29-c89a-4c7a-9352-77aaf78e91b3  rack1
UN  172.16.97.2  78.85 KiB   256     24.3%             edd1b72b-4c35-44bd-b7e5-316f41a156c4  rack1

root@parasilo-3:~# cqlsh 172.16.97.3
Connected to swh-storage at 172.16.97.3:9042
[cqlsh 6.0.0 | Cassandra 4.0 | CQL spec 3.4.5 | Native protocol v5]
cqlsh> desc KEYSPACES

Jun 10 2021, 7:02 PM · System administration, Storage manager