It still remains 1 day to consume the origin*topics.
The metadata were completely ingested so the metadatasearch can be tested on webapp1 after the configuration will be updated to use the new index.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jun 23 2021
- The louvre's certificates are all created on opnsense
- I perform some tests to access the firewall through a serial console. The ssh access can be done by using a serial console of one of the server expose on the IPMI network:
# ipmitool -I lanplus -H swh-ceph-mon1-adm.inria.fr -U XXX -P XXX sol activate [SOL Session operational. Use ~? for help]
All the revoked certificates are imported on the opnsense crl.
I will also import the valid certificates so we will have them in case a revocation is needed
Jun 22 2021
LGTM thanks it's obviously less complicated
The certificated revoked by louvre can be imported in opnsense and revoked in an internal crl.
It more simple than importing the current louvre's crl as an imported crl needs to be manage externally and it's raw content paste on the ui.
- main vlan440 ips changes to regroup the fw at the beginning of the range and have similar ips on each vlan:
- pushkin: 192.168.100.128 -> 192.168.100.2
- glyptotek: 192.168.100.129 -> 192.168.100.3
- next step is to try to import the current certificate revocation list of louvre
- new dns entry vpn.softwareheritage.org created:
vpn A 3600 128.93.166.2
- The address change of the firewalls is in preparation with D5906
the reindexation should be done by the end of the day
- journal clients started:
root@search1:~/T3398# swh search --config-file journal_client_objects.yml journal-client objects INFO:elasticsearch:POST http://search-esnode4.internal.softwareheritage.org:9200/origin-v0.9.0-write/_bulk [status:200 request:0.013s] INFO:elasticsearch:POST http://search-esnode5.internal.softwareheritage.org:9200/origin-v0.9.0-write/_bulk [status:200 request:0.014s] INFO:elasticsearch:POST http://search-esnode6.internal.softwareheritage.org:9200/origin-v0.9.0-write/_bulk [status:200 request:0.012s] INFO:elasticsearch:POST http://search-esnode4.internal.softwareheritage.org:9200/origin-v0.9.0-write/_bulk [status:200 request:0.012s] ...
root@search1:~/T3398# swh search --config-file journal_client_indexed.yml journal-client objects INFO:elasticsearch:POST http://search-esnode4.internal.softwareheritage.org:9200/origin-v0.9.0-write/_bulk [status:200 request:0.758s] INFO:elasticsearch:POST http://search-esnode5.internal.softwareheritage.org:9200/origin-v0.9.0-write/_bulk [status:200 request:0.023s] INFO:elasticsearch:POST http://search-esnode6.internal.softwareheritage.org:9200/origin-v0.9.0-write/_bulk [status:200 request:0.024s] INFO:elasticsearch:POST http://search-esnode4.internal.softwareheritage.org:9200/origin-v0.9.0-write/_bulk [status:200 request:0.023s] ...
- journal clients configuration prepared:
root@search1:~/T3398# diff -U3 /etc/softwareheritage/search/journal_client_objects.yml journal_client_objects.yml --- /etc/softwareheritage/search/journal_client_objects.yml 2021-06-10 08:08:19.555062808 +0000 +++ journal_client_objects.yml 2021-06-22 09:19:04.841898294 +0000 @@ -8,13 +8,18 @@ port: 9200 - host: search-esnode6.internal.softwareheritage.org port: 9200 + indexes: + origin: + index: origin-v0.9.0 + read_alias: origin-v0.9.0-read + write_alias: origin-v0.9.0-write journal: brokers: - kafka1.internal.softwareheritage.org - kafka2.internal.softwareheritage.org - kafka3.internal.softwareheritage.org - kafka4.internal.softwareheritage.org - group_id: swh.search.journal_client + group_id: swh.search.journal_client-v0.9.0 prefix: swh.journal.objects object_types: - origin
root@search1:~/T3398# diff -U3 /etc/softwareheritage/search/journal_client_indexed.yml journal_client_indexed.yml --- /etc/softwareheritage/search/journal_client_indexed.yml 2021-06-10 09:34:00.980897650 +0000 +++ journal_client_indexed.yml 2021-06-22 09:27:18.507340257 +0000 @@ -8,13 +8,18 @@ port: 9200 - host: search-esnode6.internal.softwareheritage.org port: 9200 + indexes: + origin: + index: origin-v0.9.0 + read_alias: origin-v0.9.0-read + write_alias: origin-v0.9.0-write journal: brokers: - kafka1.internal.softwareheritage.org - kafka2.internal.softwareheritage.org - kafka3.internal.softwareheritage.org - kafka4.internal.softwareheritage.org - group_id: swh.search.journal_client.indexed + group_id: swh.search.journal_client.indexed-v0.9.0 prefix: swh.journal.indexed object_types: - origin_intrinsic_metadata
- new index initialized:
root@search1:~/T3398# diff -U3 /etc/softwareheritage/search/server.yml server.yml --- /etc/softwareheritage/search/server.yml 2021-06-10 08:08:17.819058015 +0000 +++ server.yml 2021-06-22 09:11:16.132518743 +0000 @@ -10,7 +10,7 @@ port: 9200 indexes: origin: - index: origin-production - read_alias: origin-read - write_alias: origin-write + index: origin-v0.9.0 + read_alias: origin-v0.9.0-read + write_alias: origin-v0.9.0-write
On search1:
- puppet disabled
- swh-search / jounal clients stopped
- packages updated:
apt list --upgradable 2>/dev/null | grep python3-swh | cut -f1 -d'/' | xargs apt install -V --dry-run ... The following packages will be upgraded: The following packages will be upgraded: python3-swh.core (0.13.0-1~swh1~bpo10+1 => 0.14.3-1~swh1~bpo10+1) python3-swh.indexer (0.7.0-1~swh1~bpo10+1 => 0.8.0-1~swh1~bpo10+1) python3-swh.indexer.storage (0.7.0-1~swh1~bpo10+1 => 0.8.0-1~swh1~bpo10+1) python3-swh.journal (0.7.1-1~swh1~bpo10+1 => 0.8.0-1~swh1~bpo10+1) python3-swh.model (2.3.0-1~swh1~bpo10+1 => 2.6.1-1~swh1~bpo10+1) python3-swh.objstorage (0.2.2-1~swh1~bpo10+1 => 0.2.3-1~swh1~bpo10+1) python3-swh.scheduler (0.10.0-1~swh1~bpo10+1 => 0.15.0-1~swh1~bpo10+1) python3-swh.search (0.8.0-1~swh1~bpo10+1 => 0.9.0-1~swh1~bpo10+1) python3-swh.storage (0.27.2-1~swh1~bpo10+1 => 0.30.1-1~swh1~bpo10+1) 9 upgraded, 0 newly installed, 0 to remove and 8 not upgraded. ...
- Index intialisation done and swh-search restarted:
root@search1:~# swh search -C /etc/softwareheritage/search/server.yml initialize INFO:elasticsearch:HEAD http://search-esnode6.internal.softwareheritage.org:9200/origin-production [status:200 request:0.025s] INFO:elasticsearch:HEAD http://search-esnode4.internal.softwareheritage.org:9200/origin-read/_alias [status:200 request:0.018s] INFO:elasticsearch:HEAD http://search-esnode5.internal.softwareheritage.org:9200/origin-write/_alias [status:200 request:0.003s] INFO:elasticsearch:PUT http://search-esnode6.internal.softwareheritage.org:9200/origin-production/_mapping [status:200 request:0.102s] Done. root@search1:~# systemctl start gunicorn-swh-search.service
- journal client restarted with no errors on the logs, the search is still working fom the webapp
An array with the possible node count relative to the replication factor was added on the hedgedoc document : https://hedgedoc.softwareheritage.org/m2MBUViUQl2r9dwcq3-_Nw?both
Jun 21 2021
The first draft of the migration plan is available here: https://hedgedoc.softwareheritage.org/XBVc1QZhR_aVdbfSviqchg
Jun 18 2021
The backfill is done.
- Puppet is reactivated
- The index configuration set to the default:
root@search0:/etc/softwareheritage/search# export ES_SERVER=192.168.130.80:9200 root@search0:/etc/softwareheritage/search# export INDEX=origin-v0.9.0 root@search0:/etc/softwareheritage/search# cat >/tmp/config.json <<EOF > { > "index" : { > "translog.sync_interval" : null, > "translog.durability": null, > "refresh_interval": null > } > } > EOF root@search0:/etc/softwareheritage/search# curl -s -H "Content-Type: application/json" -XPUT http://${ES_SERVER}/${INDEX}/_settings -d @/tmp/config.json {"acknowledged":true}
@vlorentz If you have an idea on how to implement that, I take it ;), I'm not sure if I have not missed something
Several tests were executed with cassandra node on the parasilo cluster [1]
The configuration was always the same to calibrate the runs:
- ZFS is used to manage to datasets
- the commitlogs in the 200Go SSD drive
- the data in the 4 600Gb HDD configured in RAID0
- Default memory configuration (8Go / default GC (not g1))
- Cassandra configuration : [2]
The configuration of the index was changed to increase the reindexation speed:
root@search-esnode0:~# export ES_SERVER=192.168.130.80:9200 root@search-esnode0:~# export INDEX=origin-v0.9.0 root@search-esnode0:~# cat >/tmp/config.json <<EOF > { > "index" : { > "translog.sync_interval" : "60s", > "translog.durability": "async", > "refresh_interval": "60s" > } > } > EOF root@search-esnode0:~# curl -s -H "Content-Type: application/json" -XPUT http://${ES_SERVER}/${INDEX}/_settings -d @/tmp/config.json
The default settings will be restored when the journal_client will have recovered.
Unerlying index of the aliases changed to use the new index for read and write:
root@search0:~/T3391# curl -XPOST -H 'Content-Type: application/json' http://search-esnode0:9200/_aliases -d ' > { > "actions" : [ > { "remove" : { "index" : "origin", "alias" : "origin-read" } }, > { "remove" : { "index" : "origin", "alias" : "origin-write" } }, > { "add" : { "index" : "origin-v0.9.0", "alias" : "origin-read" } }, > { "add" : { "index" : "origin-v0.9.0", "alias" : "origin-write" } } > ] > }'
Also change the consumer group of the journal clients to keep the current position
of the backfill
Indexation rescheduled as in https://forge.softwareheritage.org/T3037#58463:
swhscheduler@scheduler0:~$ /usr/bin/swh scheduler --config-file scheduler.yml task schedule_origins --storage-url http://storage1.internal.staging.swh.network:5002 index-origin-metadata 2>&1 | tee schedule_origins.logs ... page_token: 79901 Scheduled 8000 tasks (80000 origins). page_token: 80001 page_token: 80101 ...
and counting...
root@worker3:/etc/softwareheritage# puppet agent --disable 'recreate origin_intrinsic_metadata topic' root@worker3:/etc/softwareheritage# systemctl stop swh-worker@indexer_origin_intrinsic_metadata.service
root@search0:~/T3391# puppet agent --disable 'recreate origin_intrinsic_metadata topic' root@search0:~/T3391# systemctl stop swh-search-journal-client@indexed.service
vsellier@journal0 /var/log/kafka % /opt/kafka/bin/kafka-topics.sh --bootstrap-server $SERVER --delete --topic swh.journal.indexed.origin_intrinsic_metadata vsellier@journal0 /var/log/kafka % /opt/kafka/bin/kafka-topics.sh --bootstrap-server $SERVER --create --config cleanup.policy=compact --partitions 64 --replication-factor 1 --topic "swh.journal.indexed.origin_intrinsic_metadata" WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both. Created topic swh.journal.indexed.origin_intrinsic_metadata. % /opt/kafka/bin/kafka-topics.sh --bootstrap-server $SERVER --describe --topic swh.journal.indexed.origin_intrinsic_metadata Topic: swh.journal.indexed.origin_intrinsic_metadata PartitionCount: 64 ReplicationFactor: 1 Configs: cleanup.policy=compact,max.message.bytes=104857600 Topic: swh.journal.indexed.origin_intrinsic_metadata Partition: 0 Leader: 1 Replicas: 1 Isr: 1 Topic: swh.journal.indexed.origin_intrinsic_metadata Partition: 1 Leader: 1 Replicas: 1 Isr: 1 Topic: swh.journal.indexed.origin_intrinsic_metadata Partition: 2 Leader: 1 Replicas: 1 Isr: 1 ...
root@worker3:/etc/systemd/system# systemctl start swh-worker@indexer_origin_intrinsic_metadata.service root@worker3:/etc/systemd/system# puppet agent --enable
root@search0:~/T3391# systemctl start swh-search-journal-client@indexed.service root@search0:~/T3391# puppet agent --enable
It seems the swh.journal.indexed.origin_intrinsic_metadata was automatically created so the retention policy was not specifiy (and there is only one partition (!))
% /opt/kafka/bin/kafka-topics.sh --bootstrap-server $SERVER --describe --topic swh.journal.indexed.origin_intrinsic_metadata Topic: swh.journal.indexed.origin_intrinsic_metadata PartitionCount: 1 ReplicationFactor: 1 Configs: max.message.bytes=104857600 Topic: swh.journal.indexed.origin_intrinsic_metadata Partition: 0 Leader: 1 Replicas: 1 Isr: 1
Jun 17 2021
- Temporary configuration created:
- objects:
root@search0:~/T3391# diff -U3 /etc/softwareheritage/search/journal_client_objects.yml journal_client_objects.yml --- /etc/softwareheritage/search/journal_client_objects.yml 2020-12-10 11:04:08.460777825 +0000 +++ journal_client_objects.yml 2021-06-17 16:48:56.006110527 +0000 @@ -4,10 +4,15 @@ hosts: - host: search-esnode0.internal.staging.swh.network port: 9200 + indexes: + origin: + index: origin-v0.9.0 + read_alias: origin-v0.9.0-read + write_alias: origin-v0.9.0-write journal: brokers: - journal0.internal.staging.swh.network - group_id: swh.search.journal_client + group_id: swh.search.journal_client-v0.9.0 prefix: swh.journal.objects object_types: - origin
- indexed:
root@search0:~/T3391# diff -U3 /etc/softwareheritage/search/journal_client_indexed.yml journal_client_indexed.yml --- /etc/softwareheritage/search/journal_client_indexed.yml 2021-02-09 17:48:44.269681575 +0000 +++ journal_client_indexed.yml 2021-06-17 16:49:57.926120227 +0000 @@ -4,10 +4,15 @@ hosts: - host: search-esnode0.internal.staging.swh.network port: 9200 + indexes: + origin: + index: origin-v0.9.0 + read_alias: origin-v0.9.0-read + write_alias: origin-v0.9.0-write journal: brokers: - journal0.internal.staging.swh.network - group_id: swh.search.journal_client.indexed + group_id: swh.search.journal_client.indexed-v0.9.0 prefix: swh.journal.indexed object_types: - origin_intrinsic_metadata
- upgrade packages:
root@search0:~/T3391# apt list --upgradable Listing... Done libnss-systemd/buster-backports 247.3-5~bpo10+1 amd64 [upgradable from: 247.3-3~bpo10+1] libpam-systemd/buster-backports 247.3-5~bpo10+1 amd64 [upgradable from: 247.3-3~bpo10+1] libsystemd0/buster-backports 247.3-5~bpo10+1 amd64 [upgradable from: 247.3-3~bpo10+1] libudev1/buster-backports 247.3-5~bpo10+1 amd64 [upgradable from: 247.3-3~bpo10+1] python3-swh.core/unknown 0.14.3-1~swh1~bpo10+1 all [upgradable from: 0.13.0-1~swh1~bpo10+1] python3-swh.indexer.storage/unknown 0.8.0-1~swh1~bpo10+1 all [upgradable from: 0.7.0-1~swh1~bpo10+1] python3-swh.indexer/unknown 0.8.0-1~swh1~bpo10+1 all [upgradable from: 0.7.0-1~swh1~bpo10+1] python3-swh.model/unknown 2.6.1-1~swh1~bpo10+1 all [upgradable from: 2.3.0-1~swh1~bpo10+1] python3-swh.objstorage/unknown 0.2.3-1~swh1~bpo10+1 all [upgradable from: 0.2.2-1~swh1~bpo10+1] python3-swh.scheduler/unknown 0.15.0-1~swh1~bpo10+1 all [upgradable from: 0.10.0-1~swh1~bpo10+1] python3-swh.search/unknown 0.9.0-1~swh1~bpo10+1 all [upgradable from: 0.8.0-1~swh1~bpo10+1] python3-swh.storage/unknown 0.30.1-1~swh1~bpo10+1 all [upgradable from: 0.27.2-1~swh1~bpo10+1] systemd-sysv/buster-backports 247.3-5~bpo10+1 amd64 [upgradable from: 247.3-3~bpo10+1] systemd-timesyncd/buster-backports 247.3-5~bpo10+1 amd64 [upgradable from: 247.3-3~bpo10+1] systemd/buster-backports 247.3-5~bpo10+1 amd64 [upgradable from: 247.3-3~bpo10+1] udev/buster-backports 247.3-5~bpo10+1 amd64 [upgradable from: 247.3-3~bpo10+1]
- Initialize the new index and aliases:
root@search0:~/T3391# swh search --config-file journal_client_objects.yml initialize INFO:elasticsearch:PUT http://search-esnode0.internal.staging.swh.network:9200/origin-v0.9.0 [status:200 request:5.373s] INFO:elasticsearch:PUT http://search-esnode0.internal.staging.swh.network:9200/origin-v0.9.0/_alias/origin-v0.9.0-read [status:200 request:0.052s] INFO:elasticsearch:PUT http://search-esnode0.internal.staging.swh.network:9200/origin-v0.9.0/_alias/origin-v0.9.0-write [status:200 request:0.038s] INFO:elasticsearch:PUT http://search-esnode0.internal.staging.swh.network:9200/origin-v0.9.0/_mapping [status:200 request:0.086s] Done.
- starting the origin* journal client:
root@search0:~/T3391# swh search --config-file journal_client_objects.yml journal-client objects INFO:elasticsearch:POST http://search-esnode0.internal.staging.swh.network:9200/origin-v0.9.0-write/_bulk [status:200 request:0.661s] INFO:elasticsearch:POST http://search-esnode0.internal.staging.swh.network:9200/origin-v0.9.0-write/_bulk [status:200 request:1.671s] INFO:elasticsearch:POST http://search-esnode0.internal.staging.swh.network:9200/origin-v0.9.0-write/_bulk [status:200 request:2.047s] INFO:elasticsearch:POST http://search-esnode0.internal.staging.swh.network:9200/origin-v0.9.0-write/_bulk [status:200 request:0.179s]
- starting the indexed metadata journal client:
root@search0:~/T3391# swh search --config-file journal_client_indexed.yml journal-client objects
- Index status:
root@search0:~/T3391# curl -s http://search-esnode0:9200/_cat/indices\?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open origin HthJj42xT5uO7w3Aoxzppw 80 0 1320217 166798 1.7gb 1.7gb green open origin-v0.9.0 o7FiYJWnTkOViKiAdCXCuA 80 0 263556 400 214.8mb 214.8mb green close origin-backup-20210209-1736 P1CKjXW0QiWM5zlzX46-fg 80 0 green close origin-v0.5.0 SGplSaqPR_O9cPYU4ZsmdQ 80 0
The origin_visits* topics have not started to be ingested, so the new fields are not yes present on the index.
The fix is released in version 'v0.9.0'.
I will deploy it on staging and launch a complete reindexation of the metadata (+origin but needed by other changes on this release)
Jun 16 2021
Some notes on how to perform common actions with cassandra: https://hedgedoc.softwareheritage.org/m2MBUViUQl2r9dwcq3-_Nw
Thanks, the diff is updated according to your reviews:
- rename the test to something more generic
- add searches on the boolean field
Jun 15 2021
LGTM
rebase
The environment can be stopped and rebuild as long as the disk remained reserved on the servers.
Jun 14 2021
nice, thanks!
Jun 11 2021
This is the mappings of the staging and production environments.
As expected, the production mapping has more fields than the staging one.
- The old servers are stopped and the VMs removed from proxmox
- nodes removed from puppet inventory
- Inventory site is updated
- Journal client for indexed metadata deployed, the backfill is done (in a couple of hours): https://grafana.softwareheritage.org/goto/ndjfw66Gz
- The metadata search via ES is activated on https://webapp1.internal.softwareheritage.org/
Jun 10 2021
Some status about the automation:
- Cassandra nodes are ok (os installation, zfs configuration according to the defined environment except a problem during the first initialization with new disks, startup, cluster configuration)
- swh-storage node is ok (os installation, gunicorn/swh-storage installation and startup)
- cassandra database initialization :
root@parasilo-3:~# nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 172.16.97.3 78.85 KiB 256 31.6% 49d46dd8-4640-45eb-9d4c-b6b16fc954ab rack1 UN 172.16.97.5 105.45 KiB 256 26.0% 47e99bb4-4846-4e03-a06c-53ea2862172d rack1 UN 172.16.97.4 98.35 KiB 256 18.1% e2aeff29-c89a-4c7a-9352-77aaf78e91b3 rack1 UN 172.16.97.2 78.85 KiB 256 24.3% edd1b72b-4c35-44bd-b7e5-316f41a156c4 rack1
root@parasilo-3:~# cqlsh 172.16.97.3 Connected to swh-storage at 172.16.97.3:9042 [cqlsh 6.0.0 | Cassandra 4.0 | CQL spec 3.4.5 | Native protocol v5] cqlsh> desc KEYSPACES
LGTM
LGTM, still not a big fan of the usage of random in the tests ;), but otherwise, it matches what you explain to me this morning