Feed Advanced Search

Advanced Search
Use Results
Edit Query
Hide Query

	Include stories about projects I am a member of.

Aug 6 2021

vlorentz added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

Do you think a run with cassandra is necessary to evaluate a potential performance impact?

Aug 6 2021, 6:52 PM · System administration, Storage manager

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

It seems D6067 solves the issue with the partition key cartesian product size. @vlorentz Do you think a run with cassandra is necessary to evaluate a potential performance impact?

Aug 6 2021, 5:33 PM · System administration, Storage manager

vlorentz added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

as scylla is coming with its own prometheus node exporter (and is removing the default packages :()

Aug 6 2021, 3:48 PM · System administration, Storage manager

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

The db server prometheus configuration needs some adaptation as scylla is coming with its own prometheus node exporter (and is removing the default packages :()

root@parasilo-2:/opt# apt install scylla-node-exporter
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libio-pty-perl libipc-run-perl moreutils
Use 'apt autoremove' to remove them.
The following packages will be REMOVED:
  prometheus-node-exporter
The following NEW packages will be installed:
  scylla-node-exporter
0 upgraded, 1 newly installed, 1 to remove and 7 not upgraded.
Need to get 0 B/4,076 kB of archives.
After this operation, 3,243 kB of additional disk space will be used.

Aug 6 2021, 3:17 PM · System administration, Storage manager

vsellier updated subscribers of T3357: Perform some tests of the cassandra storage on Grid5000.

Thanks @vlorentz for D6067, I will test the fix when the cluster will be more stable

Aug 6 2021, 3:06 PM · System administration, Storage manager

vlorentz added a revision to T3357: Perform some tests of the cassandra storage on Grid5000: D6067: cassandra: Fix crash when using _missing() functions with more than 100 ids with ScyllaDB..

Aug 6 2021, 3:00 PM · System administration, Storage manager

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

there is also a lot of error on the scylla logs relative to read timeout (with no activities on the database except the monitoring):

Aug 06 14:52:10 parasilo-4.rennes.grid5000.fr scylla[16488]:  [shard 5] storage_proxy - Exception when communicating with 172.16.97.4, to read from swh.object_count: seastar::named_semaphore_timed_out (Semaphore timed out: _read_concurrency_sem)
Aug 06 14:52:10 parasilo-4.rennes.grid5000.fr scylla[16488]:  [shard 6] storage_proxy - Exception when communicating with 172.16.97.4, to read from swh.object_count: seastar::named_semaphore_timed_out (Semaphore timed out: _read_concurrency_sem)

Aug 6 2021, 2:53 PM · System administration, Storage manager

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

After having some hard time to configure and correctly start the scylla servers (different binding, configuration adaptation), the schema was correctly created (I needed to add SWH_USE_SCYLLADB=1 on the initialisation script).
Compared to cassandra, it seems the nodetool command didn't return correctly the data repartition on the cluster because the system keyspaces hasn't the same replication factor as the swh one

vsellier@parasilo-2:~$  nodetool status
Using /etc/scylla/scylla.yaml as the config file
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns    Host ID                               Rack
UN  172.16.97.2  2.36 MB    256          ?       866bbcc4-d496-4ebb-ab3b-12ef4942beaa  rack1
UN  172.16.97.3  3.37 MB    256          ?       21fdd0a9-15cd-473f-814c-c8ac24870aca  rack1
UN  172.16.97.4  3.48 MB    256          ?       1ed61715-01a0-4c15-a4bc-f9972f575437  rack1

Aug 6 2021, 1:09 PM · System administration, Storage manager

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

scylladb test

Aug 6 2021, 11:56 AM · System administration, Storage manager

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

run7 results - cassandra heap from 16g to 32g

Aug 6 2021, 10:49 AM · System administration, Storage manager

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

run6 results - commitlog on a HDD

Aug 6 2021, 10:28 AM · System administration, Storage manager

Aug 5 2021

vsellier triaged T3465: Test multidatacenter replication as Normal priority.

Aug 5 2021, 12:31 PM · System administration, Storage manager

vsellier triaged T3464: Prepare a quote for the cassandra servers as Normal priority.

Aug 5 2021, 12:20 PM · System administration, Storage manager

vsellier updated the task description for T3357: Perform some tests of the cassandra storage on Grid5000.

Aug 5 2021, 12:18 PM · System administration, Storage manager

vsellier updated the task description for T3357: Perform some tests of the cassandra storage on Grid5000.

Aug 5 2021, 12:17 PM · System administration, Storage manager

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

Some news about the tests running since the beginning of the week:

The data retention of the federated prometheus had the default value so all the data has expired after 15 days. A new reference run was performed to be able to compare with the default scenario
The first try failed because it was the first time there were adaption on the zfs configuration and it was not correctly deploy via the ansible scripts. It was solved by completely cleaning up the zfs configuration and relaunching the deployment. Unfortunately, it needs to be manually launched before launching a test with zfs changes.
With the usage of the best effort jobs, it's possible to perform test during the days without exceeding the quota

Aug 5 2021, 12:16 PM · System administration, Storage manager

Aug 2 2021

vlorentz moved T3450: 404 error when visiting a successfully archived repository from code-review/await-feedback/pause to done on the System administration board.

Aug 2 2021, 10:38 AM · Storage manager, System administration

vlorentz closed T3450: 404 error when visiting a successfully archived repository as Resolved.

This should be resolved now (actually, on the 31st at 16).

Aug 2 2021, 10:37 AM · Storage manager, System administration

Jul 30 2021

ardumont moved T3450: 404 error when visiting a successfully archived repository from in-progress to code-review/await-feedback/pause on the System administration board.

Jul 30 2021, 11:22 AM · Storage manager, System administration

ardumont added a comment to T3450: 404 error when visiting a successfully archived repository.

Thanks for the heads up @ both of you.

Jul 30 2021, 11:14 AM · Storage manager, System administration

vlorentz added a project to T3450: 404 error when visiting a successfully archived repository: Storage manager.

Jul 30 2021, 11:07 AM · Storage manager, System administration

vlorentz added a revision to T3135: Improve integrity of ingested content: D6045: converters: Preserve GPG signatures on releases.

Jul 30 2021, 10:59 AM · Storage manager, Roadmap 2021, meta-task

Jul 29 2021

ardumont added a parent task for T3418: Decide a consistent policy on having multiple archived objects for the same extid: T3338: Load the archived bitbucket mercurial repositories.

Jul 29 2021, 6:30 PM · Storage manager, Mercurial loader

ardumont closed T3418: Decide a consistent policy on having multiple archived objects for the same extid as Resolved.

Jul 29 2021, 6:21 PM · Storage manager, Mercurial loader

ardumont moved T3448: production: Deploy swh.loader.mercurial v2.1.0 from deployed/landed/monitoring to done on the System administration board.

Jul 29 2021, 6:21 PM · System administration, Storage manager, Mercurial loader

ardumont moved T3448: production: Deploy swh.loader.mercurial v2.1.0 from in-progress to deployed/landed/monitoring on the System administration board.

Jul 29 2021, 6:21 PM · System administration, Storage manager, Mercurial loader

ardumont added a comment to T3418: Decide a consistent policy on having multiple archived objects for the same extid.

Deployed \o/
Closing.

Jul 29 2021, 6:21 PM · Storage manager, Mercurial loader

ardumont closed T3448: production: Deploy swh.loader.mercurial v2.1.0, a subtask of T3418: Decide a consistent policy on having multiple archived objects for the same extid, as Resolved.

Jul 29 2021, 6:20 PM · Storage manager, Mercurial loader

ardumont closed T3448: production: Deploy swh.loader.mercurial v2.1.0 as Resolved.

It's working but the check does not pass green [1]. As far as i could tell, the unsuccessful
event [1] is seen as failure by the check.

Jul 29 2021, 6:20 PM · System administration, Storage manager, Mercurial loader

ardumont changed the status of T3448: production: Deploy swh.loader.mercurial v2.1.0, a subtask of T3418: Decide a consistent policy on having multiple archived objects for the same extid, from Open to Work in Progress.

Jul 29 2021, 5:40 PM · Storage manager, Mercurial loader

ardumont changed the status of T3448: production: Deploy swh.loader.mercurial v2.1.0 from Open to Work in Progress.

Jul 29 2021, 5:40 PM · System administration, Storage manager, Mercurial loader

ardumont added a comment to T3448: production: Deploy swh.loader.mercurial v2.1.0.

At the end of it all though, the final production check end-to-end for mercurial origin should go green.

Jul 29 2021, 5:40 PM · System administration, Storage manager, Mercurial loader

ardumont updated the task description for T3448: production: Deploy swh.loader.mercurial v2.1.0.

Jul 29 2021, 5:32 PM · System administration, Storage manager, Mercurial loader

ardumont moved T3448: production: Deploy swh.loader.mercurial v2.1.0 from Backlog to Weekly backlog on the System administration board.

Jul 29 2021, 1:24 PM · System administration, Storage manager, Mercurial loader

ardumont moved T3394: cassandra - origin url hashing encoding issue from Backlog to done on the System administration board.

Jul 29 2021, 1:24 PM · System administration, Storage manager

ardumont moved T3447: staging: Deploy swh.loader.mercurial v2.1.0 from deployed/landed/monitoring to done on the System administration board.

Jul 29 2021, 1:21 PM · System administration, Storage manager, Mercurial loader

ardumont moved T3447: staging: Deploy swh.loader.mercurial v2.1.0 from in-progress to deployed/landed/monitoring on the System administration board.

Jul 29 2021, 1:21 PM · System administration, Storage manager, Mercurial loader

ardumont claimed T3448: production: Deploy swh.loader.mercurial v2.1.0.

Jul 29 2021, 1:21 PM · System administration, Storage manager, Mercurial loader

ardumont updated the task description for T3448: production: Deploy swh.loader.mercurial v2.1.0.

Jul 29 2021, 1:21 PM · System administration, Storage manager, Mercurial loader

ardumont triaged T3448: production: Deploy swh.loader.mercurial v2.1.0 as Unbreak Now! priority.

Jul 29 2021, 1:20 PM · System administration, Storage manager, Mercurial loader

ardumont closed T3447: staging: Deploy swh.loader.mercurial v2.1.0, a subtask of T3418: Decide a consistent policy on having multiple archived objects for the same extid, as Resolved.

Jul 29 2021, 1:18 PM · Storage manager, Mercurial loader

ardumont closed T3447: staging: Deploy swh.loader.mercurial v2.1.0 as Resolved.

Jul 29 2021, 1:18 PM · System administration, Storage manager, Mercurial loader

ardumont added a comment to T3447: staging: Deploy swh.loader.mercurial v2.1.0.

From the staging webapp, we identify a revision [1]

Jul 29 2021, 1:17 PM · System administration, Storage manager, Mercurial loader

ardumont updated the task description for T3447: staging: Deploy swh.loader.mercurial v2.1.0.

Jul 29 2021, 12:47 PM · System administration, Storage manager, Mercurial loader

ardumont changed the status of T3447: staging: Deploy swh.loader.mercurial v2.1.0, a subtask of T3418: Decide a consistent policy on having multiple archived objects for the same extid, from Open to Work in Progress.

Jul 29 2021, 12:32 PM · Storage manager, Mercurial loader

ardumont changed the status of T3447: staging: Deploy swh.loader.mercurial v2.1.0 from Open to Work in Progress.

Jul 29 2021, 12:32 PM · System administration, Storage manager, Mercurial loader

ardumont added a project to T3447: staging: Deploy swh.loader.mercurial v2.1.0: System administration.

Jul 29 2021, 12:32 PM · System administration, Storage manager, Mercurial loader

ardumont changed the status of T3418: Decide a consistent policy on having multiple archived objects for the same extid from Open to Work in Progress.

by the way ^

Jul 29 2021, 12:31 PM · Storage manager, Mercurial loader

ardumont triaged T3447: staging: Deploy swh.loader.mercurial v2.1.0 as Unbreak Now! priority.

Jul 29 2021, 12:30 PM · System administration, Storage manager, Mercurial loader

ardumont added a comment to T3418: Decide a consistent policy on having multiple archived objects for the same extid.

Shipped the following modules to solve that problem:

swh.model v2.7.0
swh.storage v0.35.0
swh.loader.mercurial v2.1.0

Jul 29 2021, 12:26 PM · Storage manager, Mercurial loader

Jul 28 2021

ardumont added a revision to T3418: Decide a consistent policy on having multiple archived objects for the same extid: D6037: Update debian maintainer cassandra public keys with latest changes.

Jul 28 2021, 11:13 AM · Storage manager, Mercurial loader

ardumont added a revision to T3418: Decide a consistent policy on having multiple archived objects for the same extid: D6036: Use versioned ExtIDs in loader mercurial implementation.

Jul 28 2021, 10:32 AM · Storage manager, Mercurial loader

Jul 23 2021

olasd added a comment to T3418: Decide a consistent policy on having multiple archived objects for the same extid.

For now I went the simplest way I could think of, which is:

Jul 23 2021, 5:47 PM · Storage manager, Mercurial loader

olasd added a revision to T3418: Decide a consistent policy on having multiple archived objects for the same extid: D6023: Implement storage of the ExtID.extid_version field.

Jul 23 2021, 5:38 PM · Storage manager, Mercurial loader

olasd added a revision to T3418: Decide a consistent policy on having multiple archived objects for the same extid: D6019: Add an extid_version field to ExtIDs.

Jul 23 2021, 12:14 PM · Storage manager, Mercurial loader

Jul 9 2021

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

The run with 8 nodes was faster[1] than the previous ones, as expected, but it seems it could have been even faster because the bottleneck are now the 6 replayers which have a really high load.
The performance is better between 60% to 100% depending of the object.

Jul 9 2021, 9:59 AM · System administration, Storage manager

vsellier closed T3396: cassandra - allow to configure the consistency level used by the queries, a subtask of T3357: Perform some tests of the cassandra storage on Grid5000, as Resolved.

Jul 9 2021, 9:46 AM · System administration, Storage manager

vsellier closed T3396: cassandra - allow to configure the consistency level used by the queries as Resolved.

The version was used during the last tests on grid5000. The consistency level was correctly configured

Jul 9 2021, 9:46 AM · System administration, Storage manager

Jul 8 2021

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

News for the last incomplete run with 4 nodes[1], it seems it's 25% faster than with 3 nodes which it's great
The next run will be this night with 8 cassandra nodes

Jul 8 2021, 6:29 PM · System administration, Storage manager

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

During the last run, I discovered there were cassandra logs[1] about oversized mutations on this run and all the previous ones.
It means some changes were committed but ignored when the commit log is flushed which it's absolutely wrong.

Jul 8 2021, 1:54 PM · System administration, Storage manager

Jul 7 2021

vsellier added a comment to T3396: cassandra - allow to configure the consistency level used by the queries.

released in swh-storage:v0.34.0

Jul 7 2021, 6:46 PM · System administration, Storage manager

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

A run was launched with th patch storage allowing to configure the consistency levels.
The results are on the dedicated hedgedoc document[1]

Jul 7 2021, 11:21 AM · System administration, Storage manager

Jul 6 2021

vsellier added a revision to T3396: cassandra - allow to configure the consistency level used by the queries: D5974: cassandra: Allow to configure the consistency level to use.

Jul 6 2021, 4:59 PM · System administration, Storage manager

vsellier closed T3395: cassandra - Timeouts during revision import, a subtask of T3357: Perform some tests of the cassandra storage on Grid5000, as Resolved.

Jul 6 2021, 12:44 PM · System administration, Storage manager

vsellier closed T3395: cassandra - Timeouts during revision import as Resolved.

The problem was solved with an increase of the default timeout in the cassandra configuration and by reducing the journal_client batch size.

Jul 6 2021, 12:44 PM · System administration, Storage manager

vsellier changed the status of T3396: cassandra - allow to configure the consistency level used by the queries from Open to Work in Progress.

Jul 6 2021, 12:12 PM · System administration, Storage manager

vsellier changed the status of T3396: cassandra - allow to configure the consistency level used by the queries, a subtask of T3357: Perform some tests of the cassandra storage on Grid5000, from Open to Work in Progress.

Jul 6 2021, 12:12 PM · System administration, Storage manager

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

A replay was run during during 13 hours the previous night with the current default consistency=ONE. It can be used as the reference for the next test with the LOCAL_QUORUM consistency.
After trying several options to render the result, the simpler was to export the content of a spreadsheet in a hedgedoc document [1] .
The data are stored in a prometheus instance on a proxmox vms so it will always possible to improve the statistics later[2].

Jul 6 2021, 11:55 AM · System administration, Storage manager

Jul 1 2021

vsellier updated the task description for T3357: Perform some tests of the cassandra storage on Grid5000.

Jul 1 2021, 6:33 PM · System administration, Storage manager

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

A new run was launched with 4x10 replayers per object type.
A limit is still reached for the revision replayers which ends with read timeout.
A test with only the revision replayers shows the problems start to append when the number of // replayers is greater than ~20

Jul 1 2021, 1:07 PM · System administration, Storage manager

vsellier added a comment to T3396: cassandra - allow to configure the consistency level used by the queries.

One example of the wrong behavior of the read consistencty level ONE for read requests:
One of the probe of the monitoring query is based on the objectcount table content.
The servers were hard stopped after the end of the grid5000 resrvation and it seems some replication messages were lost. After the restart the content of the table in not in sync on all the servers.

Jul 1 2021, 11:12 AM · System administration, Storage manager

zack added a comment to T3418: Decide a consistent policy on having multiple archived objects for the same extid.

(3) should be ideally implemented in a way that guarantees that extid that were resolvable in previous versions of the mapping will always be resolvable in future versions

I don't understand. Option 3 is to remove relations between extids and SWHID, so it won't be resolvable anymore.

Jul 1 2021, 9:01 AM · Storage manager, Mercurial loader

Jun 30 2021

vlorentz added a comment to T3418: Decide a consistent policy on having multiple archived objects for the same extid.

So if that mapping change, but always give back an object in the archive (pointed by a SWHID)

Jun 30 2021, 7:58 PM · Storage manager, Mercurial loader

zack added a comment to T3418: Decide a consistent policy on having multiple archived objects for the same extid.

I've the feeling that option (1) will lead in the long run to an explosion on the size of the mapping which will make us eventually converge (slowly) toward option (3).

Jun 30 2021, 7:33 PM · Storage manager, Mercurial loader

vlorentz added a comment to T3418: Decide a consistent policy on having multiple archived objects for the same extid.

and it would probably be kind of a mess from a kafka perspective

Jun 30 2021, 6:59 PM · Storage manager, Mercurial loader

olasd added a comment to T3418: Decide a consistent policy on having multiple archived objects for the same extid.

The "mapping version field" is the most fleshed out proposal as it would be my preference. My rationale for it against changing extid_type for backwards incompatible changes is that the extid_type is a property of the external artifact, while the mapping version is a property of our archiving infrastructure.

Jun 30 2021, 6:55 PM · Storage manager, Mercurial loader

vlorentz updated the task description for T3418: Decide a consistent policy on having multiple archived objects for the same extid.

Jun 30 2021, 6:55 PM · Storage manager, Mercurial loader

olasd triaged T3418: Decide a consistent policy on having multiple archived objects for the same extid as Unbreak Now! priority.

Jun 30 2021, 6:49 PM · Storage manager, Mercurial loader

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

A first run was launched with 4x20 replayered per object type.
It seems it's too much for 3 cassandra nodes. The default 1s timeout for cassandra reads is often reached.
It's seems the batch size of 1000 is also too much for some object types like snapshots
A new test will be launched this night with some changes:

reduce the number of replayers processes to 4x10 per object type
reduce the journal client batch size to 500

Jun 30 2021, 3:07 PM · System administration, Storage manager

Jun 29 2021

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

These are the first tests that will be executed:

baseline configuration : 3 cassandra nodes (parasilo[1]), commitlog on a dedicated SSD, data on 4 HDD. Goal: testing minimal configuration (a night or a complete weekend if possible)
baseline configuration + 1 cassandra nodes. Goal: Testing the performance impact of having 1 more server (duration enough to have tendencies)
baseline configuration + 3 cassandra nodes: Goal: Testing the performance impact of having the cluster size x 2 (duration enough to have tendencies)
baseline configuration but with the commitlog on the data partition. Goal check the impact of data/commitlog mutualization (duration enough to have tendencies)
baseline configuration but with 2 HDD. Goal check the impact of the number of disks + have a reference for the next run (a night)
baseline configuration but with 2 HDD + commitlog on a dedicated HDD. Goal check the impact of having the commitlog on a slower disk (duration enough to have tendencies)
baseline configuration but with 2x the default heap allocated to cassandra. Goal check the impact of the memory configuration ((!) check the gc profile)

Jun 29 2021, 12:59 PM · System administration, Storage manager

vlorentz removed a project from T3357: Perform some tests of the cassandra storage on Grid5000: Roadmap 2021.

Jun 29 2021, 10:51 AM · System administration, Storage manager

vsellier added a comment to T3394: cassandra - origin url hashing encoding issue.

Thanks, it was tested this night on grid5000, all the origins were correctly replayed without issues.

Jun 29 2021, 10:36 AM · System administration, Storage manager

Jun 28 2021

vsellier updated the task description for T3357: Perform some tests of the cassandra storage on Grid5000.

Jun 28 2021, 8:49 PM · System administration, Storage manager

vsellier added a comment to T3396: cassandra - allow to configure the consistency level used by the queries.

In T3396#66905, @vlorentz wrote:

only one confirmation of the write is needed

It's not perfect though. If the server that confirmed the writes breaks before it replicates the write, then the write is lost.

Jun 28 2021, 5:54 PM · System administration, Storage manager

vlorentz added a comment to T3396: cassandra - allow to configure the consistency level used by the queries.

only one confirmation of the write is needed

Jun 28 2021, 4:52 PM · System administration, Storage manager

vsellier added a comment to T3396: cassandra - allow to configure the consistency level used by the queries.

IMO, we should first try to have a global configuration for all the read/write queries, and improve that later if needed for performance or if it creates some problems. At worst, it will be possible to use the default ONE values by configuration.

Jun 28 2021, 4:48 PM · System administration, Storage manager

vsellier updated the task description for T3357: Perform some tests of the cassandra storage on Grid5000.

Jun 28 2021, 4:09 PM · System administration, Storage manager

vsellier updated the task description for T3357: Perform some tests of the cassandra storage on Grid5000.

Jun 28 2021, 4:09 PM · System administration, Storage manager

vsellier updated the task description for T3357: Perform some tests of the cassandra storage on Grid5000.

Jun 28 2021, 3:39 PM · System administration, Storage manager

vsellier added a comment to T3394: cassandra - origin url hashing encoding issue.

released in swh-storage:v0.32.0

Jun 28 2021, 3:37 PM · System administration, Storage manager

jayeshv updated the task description for T3413: Fix the inconsistency between snapshot_get_branches function in postgresql.storage and cassandra.storage.

Jun 28 2021, 2:52 PM · Storage manager