If not defined, this variable is set by the elasticsearch launch script https://github.com/elastic/elasticsearch/pull/80699/files#diff-ddfc3a6ea1404997e56f2e771adede06b173f0fea37b4779d827c85d6cc52897R35
I guess as the fixture is not starting elasticsearch[1] throught the startup script, the variable is not defined
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Dec 13 2021
Dec 9 2021
Dec 6 2021
Nov 24 2021
production nodes are upgraded :
- stop the journal clients:
root@search1:~# systemctl stop swh-search-journal-client@indexed root@search1:~# systemctl stop swh-search-journal-client@objects
- flush the index to speedup the recovery
curl -XPOST http://search-esnode4:9200/_flush
For each node :
- disable shard allocation:
cat > /tmp/shard_allocation.json <<EOF { "persistent": { "cluster.routing.allocation.enable": "primaries" } } EOF
The staging elasticsearch is migrated to 7.15.2, everything looks good.
Nov 23 2021
Nov 19 2021
Nov 5 2021
Oct 19 2021
Oct 1 2021
Sep 29 2021
Sep 23 2021
Sep 20 2021
Sep 13 2021
On the other hand, journal clients are sort of a resolution to T2063.
In T2073#70230, @vlorentz wrote:I'm tempted to postpone this issue until we resolve T2063...
Sep 10 2021
I'm tempted to postpone this issue until we resolve T2063...
This has been implemented and is now used by swh-web in production, closing this.
Sep 8 2021
metadata searches are now done in Elasticsearch since the deployment of T3433
Everything is deployed and look functional.
Sep 7 2021
Sep 6 2021
Sep 3 2021
production deployment:
- disable puppet
- stop and disable the journal clients and the search backend
- update the swh-search configuration to change the index name to origin-v0.11
root@search1:/etc/softwareheritage/search# diff -U3 /tmp/server.yml server.yml --- /tmp/server.yml 2021-09-03 14:06:07.896137122 +0000 +++ server.yml 2021-09-03 14:05:47.072081879 +0000 @@ -10,7 +10,7 @@ port: 9200 indexes: origin: - index: origin-production + index: origin-v0.11 read_alias: origin-read write_alias: origin-write
- update the journal-clients to use a group id swh.search.journal_client.[indexed|object]-v0.11
root@search1:/etc/softwareheritage/search# diff -U3 /tmp/journal_client_objects.yml journal_client_objects.yml --- /tmp/journal_client_objects.yml 2021-09-03 14:06:52.660255797 +0000 +++ journal_client_objects.yml 2021-09-03 14:07:10.684303568 +0000 @@ -8,7 +8,7 @@ - kafka2.internal.softwareheritage.org - kafka3.internal.softwareheritage.org - kafka4.internal.softwareheritage.org - group_id: swh.search.journal_client + group_id: swh.search.journal_client-v0.11 prefix: swh.journal.objects object_types: - origin root@search1:/etc/softwareheritage/search# diff -U3 /tmp/journal_client_indexed.yml journal_client_indexed.yml --- /tmp/journal_client_indexed.yml 2021-09-03 14:06:52.660255797 +0000 +++ journal_client_indexed.yml 2021-09-03 14:07:25.760343512 +0000 @@ -8,7 +8,7 @@ - kafka2.internal.softwareheritage.org - kafka3.internal.softwareheritage.org - kafka4.internal.softwareheritage.org - group_id: swh.search.journal_client.indexed + group_id: swh.search.journal_client.indexed-v0.11 prefix: swh.journal.indexed object_types: - origin_intrinsic_metadata
- perform a system upgrade
root@search1:/etc/softwareheritage/search# apt dist-upgrade -V ... The following NEW packages will be installed: python3-tree-sitter (0.19.0-1+swh1~bpo10+1) The following packages will be upgraded: libnss-systemd (247.3-3~bpo10+1 => 247.3-6~bpo10+1) libpam-systemd (247.3-3~bpo10+1 => 247.3-6~bpo10+1) libsystemd0 (247.3-3~bpo10+1 => 247.3-6~bpo10+1) libudev1 (247.3-3~bpo10+1 => 247.3-6~bpo10+1) python3-swh.core (0.14.3-1~swh1~bpo10+1 => 0.14.5-1~swh1~bpo10+1) python3-swh.model (2.6.1-1~swh1~bpo10+1 => 2.8.0-1~swh1~bpo10+1) python3-swh.scheduler (0.15.0-1~swh1~bpo10+1 => 0.18.0-1~swh1~bpo10+1) python3-swh.search (0.9.0-1~swh1~bpo10+1 => 0.11.4-2~swh1~bpo10+1) python3-swh.storage (0.30.1-1~swh1~bpo10+1 => 0.36.0-1~swh1~bpo10+1) systemd (247.3-3~bpo10+1 => 247.3-6~bpo10+1) systemd-sysv (247.3-3~bpo10+1 => 247.3-6~bpo10+1) systemd-timesyncd (247.3-3~bpo10+1 => 247.3-6~bpo10+1) udev (247.3-3~bpo10+1 => 247.3-6~bpo10+1) 13 upgraded, 1 newly installed, 0 to remove and 0 not upgraded. ...
There is no need to reboot
- enable and restart the swh-search backend
- check the new index creation
root@search1:/etc/softwareheritage/search# curl ${ES_SERVER}/_cat/indices\?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open origin-v0.11 XOUR_jKcTtWKjlPk_8EAlA 90 1 0 0 34.3kb 18.2kb green open origin-v0.9.0 TH9xlECuS4CcJTDw0Fqieg 90 1 175001478 36494554 293gb 146.9gb green open origin-production hZfuv0lVRImjOjO_rYgDzg 90 1 176722078 56232582 311gb 155.1gb
- update the write index alias
root@search1:~/T3433# ./update-write-alias.sh {"acknowledged":true}{"acknowledged":true}root@search1:~/T3433# root@search1:~/T3433# curl ${ES_SERVER}/_cat/aliases\?v alias index filter routing.index routing.search is_write_index origin-write origin-v0.11 - - - - origin-read-v0.9.0 origin-v0.9.0 - - - - origin-v0.9.0-read origin-v0.9.0 - - - - origin-v0.9.0-write origin-v0.9.0 - - - - origin-write-v0.9.0 origin-v0.9.0 - - - - origin-read origin-production - - - -
All the v0.9.0 stuff will be cleared once the migration to the v0.11 done
- restart the journal clients
root@search1:~# systemctl enable swh-search-journal-client@objects Created symlink /etc/systemd/system/multi-user.target.wants/swh-search-journal-client@objects.service → /etc/systemd/system/swh-search-journal-client@.service. root@search1:~# systemctl enable swh-search-journal-client@indexed Created symlink /etc/systemd/system/multi-user.target.wants/swh-search-journal-client@indexed.service → /etc/systemd/system/swh-search-journal-client@.service. root@search1:~# systemctl start swh-search-journal-client@objects root@search1:~# systemctl start swh-search-journal-client@indexed
- puppet configuration deployed in staging
- read index updated with this script:
#!/bin/bash
The lag has recovered in ~ 12hours.
The content of the index looks goods (just cherry picked a couple of origin).
Sep 1 2021
- package python3-swh.search upgraded to version 0.11.4-2, the problem is fixed
- the new index is well created:
root@search0:/# curl -s http://search-esnode0:9200/_cat/indices\?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open origin-v0.11 HljzsdD9SmKI7-8ekB_q3Q 80 0 0 0 4.2kb 4.2kb green close origin HthJj42xT5uO7w3Aoxzppw 80 0 green close origin-v0.9.0 o7FiYJWnTkOViKiAdCXCuA 80 0 green open origin-v0.10.0 -fvf4hK9QDeN8qYTJBBlxQ 80 0 1981623 559384 2.3gb 2.3gb green close origin-backup-20210209-1736 P1CKjXW0QiWM5zlzX46-fg 80 0 green close origin-v0.5.0 SGplSaqPR_O9cPYU4ZsmdQ 80 0
- journal clients enabled and restarted
- the journal clients lags should recover in less than 12h
- waiting some time to estimate the duration with only one journal client per type
The problem was fixed by rDSEA68347a5604c74150197f691593cbb05bdd34396f
thanks @olasd
Deployment of version v0.11.4 in staging:
On search0:
- puppet stopped
- stop and disable the journal clients and search backend
- update the swh-search configuration to use a origin-v0.11 index
root@search0:/etc/softwareheritage/search# diff -U2 /tmp/server.yml server.yml --- /tmp/server.yml 2021-09-01 13:42:29.347951302 +0000 +++ server.yml 2021-09-01 13:42:35.739953523 +0000 @@ -7,5 +7,5 @@ indexes: origin: - index: origin-v0.10.0 + index: origin-v0.11 read_alias: origin-read write_alias: origin-write
- update the journal-clients to use a group id swh.search.journal_client.[indexed|object]-v0.11
root@search0:/etc/softwareheritage/search# diff -U3 /tmp/journal_client_objects.yml journal_client_objects.yml --- /tmp/journal_client_objects.yml 2021-09-01 13:44:49.843999978 +0000 +++ journal_client_objects.yml 2021-09-01 13:45:03.972004852 +0000 @@ -5,7 +5,7 @@ journal: brokers: - journal0.internal.staging.swh.network - group_id: swh.search.journal_client-v0.10.0 + group_id: swh.search.journal_client-v0.11 prefix: swh.journal.objects object_types: - origin root@search0:/etc/softwareheritage/search# diff -U3 /tmp/journal_client_indexed.yml journal_client_indexed.yml --- /tmp/journal_client_indexed.yml 2021-09-01 13:44:44.847998252 +0000 +++ journal_client_indexed.yml 2021-09-01 13:44:57.020002454 +0000 @@ -5,7 +5,7 @@ journal: brokers: - journal0.internal.staging.swh.network - group_id: swh.search.journal_client.indexed-v0.10.0 + group_id: swh.search.journal_client.indexed-v0.11 prefix: swh.journal.indexed object_types: - origin_intrinsic_metadata
- perform a system upgrade, a reboot was not required
- enable and start swh-search backend
- An error occurs after the restart:
Sep 01 14:19:12 search0 python3[4066688]: 2021-09-01 14:19:12 [4066688] root:ERROR command 'cc' failed with exit status 1 Traceback (most recent call last): File "/usr/lib/python3.7/distutils/unixccompiler.py", line 118, in _compile extra_postargs) File "/usr/lib/python3.7/distutils/ccompiler.py", line 909, in spawn spawn(cmd, dry_run=self.dry_run) File "/usr/lib/python3.7/distutils/spawn.py", line 36, in spawn _spawn_posix(cmd, search_path, dry_run=dry_run) File "/usr/lib/python3.7/distutils/spawn.py", line 159, in _spawn_posix % (cmd, exit_status)) distutils.errors.DistutilsExecError: command 'cc' failed with exit status 1
The build is now fixed and the v0.11.4 version is ready to be deployed on the environments
Aug 31 2021
Aug 30 2021
Aug 25 2021
Aug 17 2021
One very important thing to get right is the Build-Depends line in the source package stanza. setuptools/distribute-based packages have the nasty habit of downloading dependencies from PyPI if they are needed at python setup.py build time. If the package is available from the system (as would be the case when Build-Depends > is up-to-date), then distribute will not try to download the package, otherwise it will try to download it. This is a huge no-no, and pybuild internally sets the http_proxy and https_proxy environment variables (to 127.0.0.1:9) to prevent this from happening.
The pypi build is still working well with the 2 last diff.
Now there is a new error during the debian ones:
dh: warning: Compatibility levels before 10 are deprecated (level 9 in use) dh_auto_clean -O--buildsystem=pybuild dh_auto_clean: warning: Compatibility levels before 10 are deprecated (level 9 in use) I: pybuild base:232: python3.9 setup.py clean WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fef2101bcd0>: Failed to establish a new connection: [Errno -2] Name or service not known'))': /simple/tree-sitter/ WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fef2101beb0>: Failed to establish a new connection: [Errno -2] Name or service not known'))': /simple/tree-sitter/ WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fef2101b850>: Failed to establish a new connection: [Errno -2] Name or service not known'))': /simple/tree-sitter/ WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fef2101b730>: Failed to establish a new connection: [Errno -2] Name or service not known'))': /simple/tree-sitter/ WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fef2101b610>: Failed to establish a new connection: [Errno -2] Name or service not known'))': /simple/tree-sitter/ ERROR: Could not find a version that satisfies the requirement tree-sitter==0.19.0 ERROR: No matching distribution found for tree-sitter==0.19.0 Traceback (most recent call last): File "/usr/lib/python3/dist-packages/setuptools/installer.py", line 75, in fetch_build_egg subprocess.check_call(cmd) File "/usr/lib/python3.9/subprocess.py", line 373, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3.9', '-m', 'pip', '--disable-pip-version-check', 'wheel', '--no-deps', '-w', '/tmp/tmpdrbws3hq', '--quiet', 'tree-sitter==0.19.0']' returned non-zero exit status 1.
Aug 16 2021
Aug 13 2021
you also need to add yarn as a build-dep for debian packages
D6088 should help
So that unstuck the pypi part ;)
Now on with unstucking the debian build [1]
there are no more errors. The fix will deployed in production with the deployment of swh-search:v0.11.0 (T3433)
Aug 11 2021
A new swh.search v0.11 got tagged (this includes the current blocking point
deactivation). That's a workaround though. I've opened a task to avoid forgetting about
the conclusion on the discussion started.
Jul 29 2021
Jul 28 2021
Jul 27 2021
By the way, a small list of caveats encountered when deploying search (not to fix
immediately, just to mention them).
Jul 26 2021
Another idea: move this fetching to a new indexer, and make it write to a new topic, which the swh-search journal client can read from.
Jul 22 2021
Jul 21 2021
We've done the following: