changing the status to resolved as the main issues are solved.
Other tests with more parallel workers will be launched, if other problems will be detected, they will be tracked in new dedicated tickets.

Sep 16 2021, 11:31 AM · System administration, Storage manager

vsellier closed T3493: [cassandra] Git loader performance are very bad, a subtask of T3357: Perform some tests of the cassandra storage on Grid5000, as Resolved.

Sep 16 2021, 11:31 AM · System administration, Storage manager

vsellier added a comment to T3579: Meta-task: upgrade infrastructure to Debian Bullseye.

WIP document: https://hedgedoc.softwareheritage.org/GYxIMBMXSRGVNxcCBrfA3w#

Sep 16 2021, 9:53 AM · System administration (Component upgrades)

vsellier updated the task description for T3579: Meta-task: upgrade infrastructure to Debian Bullseye.

Sep 16 2021, 9:44 AM · System administration (Component upgrades)

vsellier triaged T3579: Meta-task: upgrade infrastructure to Debian Bullseye as Normal priority.

Sep 16 2021, 9:43 AM · System administration (Component upgrades)

vsellier accepted D6276: keyrings/cassandra: Update.

LGTM

Sep 16 2021, 9:01 AM

Sep 15 2021

vsellier committed rDSNIP118ba1a0f735: grid5000/cassandra: declare a server type with only a big zfs dataset (authored by vsellier).

grid5000/cassandra: declare a server type with only a big zfs dataset

Sep 15 2021, 6:19 PM

vsellier updated the task description for T3577: Parallel loaders performances .

Sep 15 2021, 5:49 PM · System administration, Storage manager

vsellier changed the status of T3577: Parallel loaders performances from Open to Work in Progress.

Sep 15 2021, 5:49 PM · System administration, Storage manager

vsellier updated the task description for T3357: Perform some tests of the cassandra storage on Grid5000.

Sep 15 2021, 5:42 PM · System administration, Storage manager

vsellier added a comment to T3573: [cassandra] directory and content read benchmarks.

Test of a the new D6269 patch:

Sep 15 2021, 5:12 PM · System administration, Storage manager

vsellier added a comment to T3573: [cassandra] directory and content read benchmarks.

2 flame graphs of the previous directory_ls:

one-by-one

first run (cache cold):

c864e846cb339a94da9fd91ae12cabcf083a8685-one-by-one-1.svg113 KBDownload

Sep 15 2021, 2:55 PM · System administration, Storage manager

vsellier added a comment to T3573: [cassandra] directory and content read benchmarks.

This is the results of the different runs:

Sep 15 2021, 2:40 PM · System administration, Storage manager

vsellier created P1164 flush cassandra buffers.

Sep 15 2021, 12:04 PM

vsellier added a comment to T3573: [cassandra] directory and content read benchmarks.

The directory_ls and indirectly the get_content performace was tested with this small script: P1163
A cold restart (all buffer cleared, cassandra restarted) is done between each tests (P1164)

Sep 15 2021, 11:20 AM · System administration, Storage manager

vsellier renamed T3573: [cassandra] directory and content read benchmarks from [cassandra] directory and content read benchmarkss to [cassandra] directory and content read benchmarks.

Sep 15 2021, 11:19 AM · System administration, Storage manager

vsellier created P1163 directory_ls.py.

Sep 15 2021, 11:18 AM

vsellier changed the status of T3573: [cassandra] directory and content read benchmarks from Open to Work in Progress.

Sep 15 2021, 11:11 AM · System administration, Storage manager

vsellier closed T3476: One of the system disks of beaubourg is out of order, a subtask of T3444: 26/07/2021: Unstuck infrastructure outage then post-mortem, as Resolved.

Sep 15 2021, 8:32 AM · System administration

vsellier closed T3476: One of the system disks of beaubourg is out of order as Resolved.

The disk was received Monday and replaced Thuesday by Christophe from the DSI.
The raid card automatically launch the raid rebuild. Everything is ok now.

root@beaubourg:~#  megacli -PDList -aALL
...

Sep 15 2021, 8:32 AM · System administration

Sep 13 2021

vsellier closed T3465: Test multidatacenter replication, a subtask of T3357: Perform some tests of the cassandra storage on Grid5000, as Resolved.

Sep 13 2021, 1:48 PM · System administration, Storage manager

vsellier closed T3465: Test multidatacenter replication as Resolved.

Sep 13 2021, 1:48 PM · System administration, Storage manager

vsellier added a comment to T3465: Test multidatacenter replication.

The new datacenter is active since a couple of week.
It allowed to test:

how to declare a new dc and bootstrap it
how the data is replicated between the DC
how to perform inter/intra DC repairs
how to add nodes on a DC on bootstrap it
how to remove a datacenter

Sep 13 2021, 1:48 PM · System administration, Storage manager

vsellier closed T3464: Prepare a quote for the cassandra servers, a subtask of T3357: Perform some tests of the cassandra storage on Grid5000, as Resolved.

Sep 13 2021, 1:24 PM · System administration, Storage manager

vsellier closed T3464: Prepare a quote for the cassandra servers as Resolved.

The quote is done and validated, we will launch the command when the new matinfo deal will be available

Sep 13 2021, 1:24 PM · System administration, Storage manager

Sep 10 2021

vsellier added a comment to T3476: One of the system disks of beaubourg is out of order.

A replacement disk will be sent by DELL. It should be delivered the 2021-09-13 if everything is ok.
The DSI is notified of the delivery

Sep 10 2021, 3:23 PM · System administration

vsellier committed rSPRE45e67ae40690: Add a vm template for debian 11.0 (authored by vsellier).

Add a vm template for debian 11.0

Sep 10 2021, 10:20 AM

vsellier closed D6237: Add a template for debian 10.10.

Sep 10 2021, 10:20 AM

vsellier committed rSPRE41e5dd01a609: Add a template for debian 10.10 (authored by vsellier).

Add a template for debian 10.10

Sep 10 2021, 10:20 AM

vsellier closed D6227: Adapt the debian security repository release for bullseye distribution.

Sep 10 2021, 10:16 AM

vsellier committed rSPSITE0a3411f454e6: Adapt the debian security repository release for bullseye distribution (authored by vsellier).

Adapt the debian security repository release for bullseye distribution

Sep 10 2021, 10:16 AM

vsellier updated the diff for D6227: Adapt the debian security repository release for bullseye distribution.

rebase

Sep 10 2021, 10:15 AM

vsellier closed D6226: Prepare the debian 11 vagrant template.

Sep 10 2021, 10:14 AM

vsellier committed rSENV874a41909a12: Prepare the debian 11 vagrant template (authored by vsellier).

Prepare the debian 11 vagrant template

Sep 10 2021, 10:14 AM

vsellier updated the diff for D6226: Prepare the debian 11 vagrant template.

rebase

Sep 10 2021, 10:14 AM

vsellier updated the diff for D6226: Prepare the debian 11 vagrant template.

fix the wrong box url
Explain the move to nfs v4
Explain the puppet directories hack

Sep 10 2021, 10:09 AM

vsellier added inline comments to D6226: Prepare the debian 11 vagrant template.

Sep 10 2021, 9:52 AM

vsellier requested review of D6237: Add a template for debian 10.10.

Sep 10 2021, 8:43 AM

vsellier added a comment to D6139: cassandra: Add option to select (hopefully) more efficient batch insertion algos.

final tests with the last version, everything looks good with ~~almost the same performances~~ a better ingestion rate in batch:
4 nodes before (batch only):

Sep 10 2021, 8:31 AM

Sep 9 2021

vsellier updated the diff for D6227: Adapt the debian security repository release for bullseye distribution.

ensure it works with stretch and versions >= bullseye

Sep 9 2021, 5:06 PM

vsellier requested review of D6227: Adapt the debian security repository release for bullseye distribution.

Sep 9 2021, 3:17 PM

vsellier requested review of D6226: Prepare the debian 11 vagrant template.

Sep 9 2021, 3:13 PM

vsellier added a comment to D6139: cassandra: Add option to select (hopefully) more efficient batch insertion algos.

Thanks for the last fix, it looks better with a smaller batch size:
5 nodes:

The ingestion is ~7500 ops/s in batch compared to ~6500 before

Sep 9 2021, 9:13 AM

Sep 8 2021

vsellier closed T3040: [production] Enable swh-search's journal-client for indexed objects, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.

Sep 8 2021, 3:24 PM · Journal, Archive search

vsellier closed T3040: [production] Enable swh-search's journal-client for indexed objects as Resolved.

metadata searches are now done in Elasticsearch since the deployment of T3433

Sep 8 2021, 3:24 PM · System administration, Journal, Archive search

vsellier renamed T3433: Deploy swh.search v0.10/v0.11 from Deploy swh.search v0.10/v0.11 on staging to Deploy swh.search v0.10/v0.11.

Sep 8 2021, 3:21 PM · System administration, Archive search

vsellier closed T3433: Deploy swh.search v0.10/v0.11 as Resolved.

Everything is deployed and look functional.

Sep 8 2021, 3:21 PM · System administration, Archive search

vsellier closed D6206: webapp: support new metadata search backend configuation.

Sep 8 2021, 2:29 PM

vsellier committed rSPSITEd19dc2f55c01: webapp: support new metadata search backend configuation (authored by vsellier).

webapp: support new metadata search backend configuation

Sep 8 2021, 2:29 PM

vsellier accepted D6199: Install graph services as-is.

LGTM

Sep 8 2021, 2:22 PM

vsellier accepted D6200: Add icinga checks around the graph service.

LGTM

Sep 8 2021, 2:18 PM

vsellier added a comment to D6139: cassandra: Add option to select (hopefully) more efficient batch insertion algos.

According to the documentation of the cassandra concurrent api[1], it seems the concurrency can by specified as an argument of the execute_concurrent_with_args method. The default is 100, but it could be interesting to check with higher or lower values

Sep 8 2021, 10:27 AM

vsellier added a comment to D6139: cassandra: Add option to select (hopefully) more efficient batch insertion algos.

These are more results with different number of replayers.
Each line represents a server with 20 directory replayers, the renages are for one-by-one, concurrent, batch

1 node

2 nodes

Sep 8 2021, 10:11 AM

Sep 7 2021

vsellier requested review of D6206: webapp: support new metadata search backend configuation.

Sep 7 2021, 4:08 PM

vsellier added a revision to T3433: Deploy swh.search v0.10/v0.11: D6206: webapp: support new metadata search backend configuation.

Sep 7 2021, 4:08 PM · System administration, Archive search

vsellier accepted D6203: Retry on concurrent conflicting updates.

LGTM thanks

Sep 7 2021, 3:24 PM

vsellier closed D6202: explicitly name the metadata search configuration property.

Sep 7 2021, 3:06 PM

vsellier committed rDWAPPSc302b9a5e40d: explicitly name the metadata search configuration property (authored by vsellier).

explicitly name the metadata search configuration property

Sep 7 2021, 3:06 PM

vsellier requested review of D6202: explicitly name the metadata search configuration property.

Sep 7 2021, 2:57 PM

vsellier closed D6197: swh-search: use the consumer group used during the reindexation.

Sep 7 2021, 11:25 AM

vsellier committed rSPSITE6efa928ca146: swh-search: use the consumer group used during the reindexation (authored by vsellier).

swh-search: use the consumer group used during the reindexation

Sep 7 2021, 11:25 AM

vsellier added a revision to T3433: Deploy swh.search v0.10/v0.11: D6197: swh-search: use the consumer group used during the reindexation.

Sep 7 2021, 11:22 AM · System administration, Archive search

vsellier requested review of D6197: swh-search: use the consumer group used during the reindexation.

Sep 7 2021, 11:22 AM

vsellier closed D6183: swh-search: activate metadata search all ES on the main webapp.

Sep 7 2021, 11:02 AM

vsellier committed rSPSITE377c1fa75a27: swh-search: activate metadata search all ES on the main webapp (authored by vsellier).

swh-search: activate metadata search all ES on the main webapp

Sep 7 2021, 11:02 AM

vsellier closed D6182: swh-search: update the configuration for the deployment of v0.11.4.

Sep 7 2021, 11:02 AM

vsellier committed rSPSITE2f4076496bbd: swh-search: update the configuration for the deployment of v0.11.4 (authored by vsellier).

swh-search: update the configuration for the deployment of v0.11.4

Sep 7 2021, 11:02 AM

vsellier edited P1155 Log buffer stats.

Sep 7 2021, 10:10 AM

Sep 6 2021

vsellier triaged T3562: [swh-search] Document version conflict during parallel indexation as Normal priority.

Sep 6 2021, 2:52 PM · Archive search

Sep 3 2021

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

With the new concurrent replay of the directory, the disk usage grow up rapidly:

Sep 3 2021, 5:15 PM · System administration, Storage manager

vsellier added a comment to D6139: cassandra: Add option to select (hopefully) more efficient batch insertion algos.

Some feedback, I had to delay the benchmarks because the servers was almost full and the cluster needed to be expanded to 7 nodes. The cluster is in stabilization phase (rebuild/repair of the new node and cleanup of the old one)
When it will be done, I will be able to finalize the tests Hopefully at the beginning of the next week

Sep 3 2021, 4:51 PM

vsellier added a comment to T3433: Deploy swh.search v0.10/v0.11.

production deployment:

disable puppet
stop and disable the journal clients and the search backend
update the swh-search configuration to change the index name to origin-v0.11

root@search1:/etc/softwareheritage/search# diff -U3 /tmp/server.yml server.yml
--- /tmp/server.yml	2021-09-03 14:06:07.896137122 +0000
+++ server.yml	2021-09-03 14:05:47.072081879 +0000
@@ -10,7 +10,7 @@
     port: 9200
   indexes:
     origin:
-      index: origin-production
+      index: origin-v0.11
       read_alias: origin-read
       write_alias: origin-write

update the journal-clients to use a group id swh.search.journal_client.[indexed|object]-v0.11

root@search1:/etc/softwareheritage/search# diff -U3 /tmp/journal_client_objects.yml journal_client_objects.yml 
--- /tmp/journal_client_objects.yml	2021-09-03 14:06:52.660255797 +0000
+++ journal_client_objects.yml	2021-09-03 14:07:10.684303568 +0000
@@ -8,7 +8,7 @@
   - kafka2.internal.softwareheritage.org
   - kafka3.internal.softwareheritage.org
   - kafka4.internal.softwareheritage.org
-  group_id: swh.search.journal_client
+  group_id: swh.search.journal_client-v0.11
   prefix: swh.journal.objects
   object_types:
   - origin
root@search1:/etc/softwareheritage/search# diff -U3 /tmp/journal_client_indexed.yml journal_client_indexed.yml 
--- /tmp/journal_client_indexed.yml	2021-09-03 14:06:52.660255797 +0000
+++ journal_client_indexed.yml	2021-09-03 14:07:25.760343512 +0000
@@ -8,7 +8,7 @@
   - kafka2.internal.softwareheritage.org
   - kafka3.internal.softwareheritage.org
   - kafka4.internal.softwareheritage.org
-  group_id: swh.search.journal_client.indexed
+  group_id: swh.search.journal_client.indexed-v0.11
   prefix: swh.journal.indexed
   object_types:
   - origin_intrinsic_metadata

perform a system upgrade

root@search1:/etc/softwareheritage/search# apt dist-upgrade -V
...
The following NEW packages will be installed:
   python3-tree-sitter (0.19.0-1+swh1~bpo10+1)
The following packages will be upgraded:
   libnss-systemd (247.3-3~bpo10+1 => 247.3-6~bpo10+1)
   libpam-systemd (247.3-3~bpo10+1 => 247.3-6~bpo10+1)
   libsystemd0 (247.3-3~bpo10+1 => 247.3-6~bpo10+1)
   libudev1 (247.3-3~bpo10+1 => 247.3-6~bpo10+1)
   python3-swh.core (0.14.3-1~swh1~bpo10+1 => 0.14.5-1~swh1~bpo10+1)
   python3-swh.model (2.6.1-1~swh1~bpo10+1 => 2.8.0-1~swh1~bpo10+1)
   python3-swh.scheduler (0.15.0-1~swh1~bpo10+1 => 0.18.0-1~swh1~bpo10+1)
   python3-swh.search (0.9.0-1~swh1~bpo10+1 => 0.11.4-2~swh1~bpo10+1)
   python3-swh.storage (0.30.1-1~swh1~bpo10+1 => 0.36.0-1~swh1~bpo10+1)
   systemd (247.3-3~bpo10+1 => 247.3-6~bpo10+1)
   systemd-sysv (247.3-3~bpo10+1 => 247.3-6~bpo10+1)
   systemd-timesyncd (247.3-3~bpo10+1 => 247.3-6~bpo10+1)
   udev (247.3-3~bpo10+1 => 247.3-6~bpo10+1)
13 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
...

There is no need to reboot

enable and restart the swh-search backend
check the new index creation

root@search1:/etc/softwareheritage/search# curl ${ES_SERVER}/_cat/indices\?v
health status index             uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin-v0.11      XOUR_jKcTtWKjlPk_8EAlA  90   1          0            0     34.3kb         18.2kb
green  open   origin-v0.9.0     TH9xlECuS4CcJTDw0Fqieg  90   1  175001478     36494554      293gb        146.9gb
green  open   origin-production hZfuv0lVRImjOjO_rYgDzg  90   1  176722078     56232582      311gb        155.1gb

update the write index alias

root@search1:~/T3433# ./update-write-alias.sh 
{"acknowledged":true}{"acknowledged":true}root@search1:~/T3433# 
root@search1:~/T3433# curl ${ES_SERVER}/_cat/aliases\?v
alias               index             filter routing.index routing.search is_write_index
origin-write        origin-v0.11      -      -             -              -
origin-read-v0.9.0  origin-v0.9.0     -      -             -              -
origin-v0.9.0-read  origin-v0.9.0     -      -             -              -
origin-v0.9.0-write origin-v0.9.0     -      -             -              -
origin-write-v0.9.0 origin-v0.9.0     -      -             -              -
origin-read         origin-production -      -             -              -

All the v0.9.0 stuff will be cleared once the migration to the v0.11 done

restart the journal clients

root@search1:~# systemctl enable swh-search-journal-client@objects
Created symlink /etc/systemd/system/multi-user.target.wants/swh-search-journal-client@objects.service → /etc/systemd/system/swh-search-journal-client@.service.
root@search1:~# systemctl enable swh-search-journal-client@indexed
Created symlink /etc/systemd/system/multi-user.target.wants/swh-search-journal-client@indexed.service → /etc/systemd/system/swh-search-journal-client@.service.
root@search1:~# systemctl start swh-search-journal-client@objects
root@search1:~# systemctl start swh-search-journal-client@indexed

wait for the lag to recover, create additional journal clients if necessary
update the read index alias
land D6182, D6183, D6197
Update swh-web configuration to support the new way to configure the metadata search backend (D6202)
deploy them on webapp1 and moma