- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Nov 18 2021
Nov 17 2021
rebase
- The 3 esnodes are updated to version 7.15.2:
for each node:
puppet agent --disable
for each node:
apt update apt dist-upgrade
- update the log formating in python script
- fix a couple of typos
Nov 16 2021
diff update:
- upgrade to 7.15.2
- automatically manage the journalbeat index template from the logstash server
- upgrade to 7.15.2
- fix elasticsearch plugin upgrade
diff updates:
- remove the journal beat user
- cleanup the cursor_state file
Improve the check_journal script to check the new registry file
and fallback to the old cursor_state file if not found
- Does the icinga check for the journalbeat position still work? I assume so, as the config for it is still there.
I think so too but I will double check to be sure
good point, it seems it's not working anymore, the new version of logstash is silently ignoring the property
Nov 15 2021
- Do we really care about running as a separate user? Maybe avoiding a gratuitous divergence from upstream would be worth it.
Nope, I keep it to remains as close as what we have today, I can let it per default and add the cleanup of the user previously created
- Could you fully update the config template to the new default config?
Yep, I will try to automize that
- Does the icinga check for the journalbeat position still work? I assume so, as the config for it is still there.
I think so too but I will double check to be sure
- Should we consider moving to cursor_seek_fallback: tail to work around the issues with the old indexes needing to be reopened, when rebooting a machine? This should not apply as the cursor state should be saved on service shutdown, but maybe it isn't being read back properly...
I will also try this because the upgrade changed nothing, the current behavior is reproduced locally with the new version when I restart the vms
Nov 10 2021
For the record, the upgrade of esnode[1-3] to bullseye is ok (in vagrant).
The upgrade is done without errors, puppet is green. A reinstall from scratch is also working well without warning.
The diff to prepare the migration of filebeat and journalbeat are ready. If everything is good after the review, the upgrade will be perform at the beginning of the W46.
Allow to override the beat version
keeping this diff as planned changes as it seems there is a conflict with the elasticsearch version for the staging and swh-search elastic nodes (redeclare the elastic::elk_version property for swh-search)
*** Running octocatalog-diff on host search-esnode4.internal.softwareheritage.org I, [2021-11-10T12:54:23.710611 #16530] INFO -- : Catalogs compiled for search-esnode4.internal.softwareheritage.org I, [2021-11-10T12:54:23.998043 #16530] INFO -- : Diffs computed for search-esnode4.internal.softwareheritage.org diff origin/production/search-esnode4.internal.softwareheritage.org current/search-esnode4.internal.softwareheritage.org ******************************************* + Apt::Pin[journalbeat] => parameters => "codename": "", "component": "", "ensure": "present", "explanation": "Use the elk stack version", "label": "", "order": 50, "origin": "", "originator": "", "packages": [ "journalbeat" ], "priority": 1001, "release": "", "release_version": "", "version": "7.9.3" <--------------- Not good ******************************************* Apt::Pin[swh-journalbeat] => parameters => ensure => - present + absent explanation => - Use journalbeat packages from Software Heritage originator => - softwareheritage +_ packages => - ["journalbeat"] + * priority => - 990 + 0 ******************************************* + Apt::Setting[pref-journalbeat] => parameters => "content": "# This file is managed by Puppet. DO NOT EDIT.\nExplanation: Use... "ensure": "present", "notify_update": false, "priority": 50 ******************************************* Apt::Setting[pref-swh-journalbeat] => parameters => content => @@ -1,5 +1,5 @@ # This file is managed by Puppet. DO NOT EDIT. -Explanation: Use journalbeat packages from Software Heritage -Package: journalbeat -Pin: release o=softwareheritage -Pin-Priority: 990 +Explanation: profile: swh-journalbeat +Package: * +Pin: release a=swh-journalbeat +Pin-Priority: 0 ensure => - present + absent ******************************************* + File[/etc/apt/preferences.d/journalbeat.pref] => parameters => "content": "# This file is managed by Puppet. DO NOT EDIT.\nExplanation: Use... "ensure": "present", "group": "root", "mode": "0644", "owner": "root" ******************************************* File[/etc/apt/preferences.d/swh-journalbeat.pref] => parameters => ensure => - present + absent ******************************************* File[/etc/journalbeat/journalbeat.yml] => parameters => content => @@ -2,4 +2,10 @@ _ journalbeat: + inputs: + # Paths that should be crawled and fetched. Possible values files and directories. + # When setting a directory, all journals under it are merged. + # When empty starts to read from local journal. + - paths: [] + # What position in journald to seek to at start up # options: cursor, tail, head (defaults to tail) ******************************************* - File[/etc/journalbeat] ******************************************* + File[/etc/systemd/system/journalbeat.service.d/journalbeat.conf] => parameters => "content": "# Managed by puppet (class profile::systemd_journal::journalbeat... "ensure": "file", "group": "root", "mode": "0444", "notify": [ "Class[Systemd::Systemctl::Daemon_reload]" ], "owner": "root", "selinux_ignore_defaults": false, "show_diff": true ******************************************* + File[/etc/systemd/system/journalbeat.service.d] => parameters => "ensure": "directory", "group": "root", "owner": "root", "purge": true, "recurse": true, "selinux_ignore_defaults": false ******************************************* File[/etc/systemd/system/journalbeat.service] => parameters => ensure => - file + absent ******************************************* Package[journalbeat] => parameters => ensure => - present + 7.9.3 ******************************************* Service[journalbeat] => parameters => subscribe => + ["File[/etc/journalbeat/journalbeat.yml]", "Package[journalbeat]", "Systemd::Dropin_file[journalbeat.conf]"] ******************************************* + Systemd::Dropin_file[journalbeat.conf] => parameters => "content": "# Managed by puppet (class profile::systemd_journal::journalbeat... "daemon_reload": "lazy", "ensure": "present", "filename": "journalbeat.conf", "group": "root", "mode": "0444", "notify": [ "Service[journalbeat]" ], "owner": "root", "path": "/etc/systemd/system", "selinux_ignore_defaults": false, "show_diff": true, "unit": "journalbeat.service" ******************************************* - Systemd::Unit_file[journalbeat.service] ******************************************* *** End octocatalog-diff on search-esnode4.internal.softwareheritage.org
Nov 9 2021
Everything looks good with logstash 1:7.15.1
The monitoring of the logstash errors is still working as previously:
root@logstash0:/usr/lib/nagios/plugins/swh# ./check_logstash_errors.sh OK - No errors detected
after closing the current system index:
root@logstash0:/usr/lib/nagios/plugins/swh# ./check_logstash_errors.sh CRITICAL - Logstash has detected some errors in outputs errors=9 non_retryable_errors=13
To upgrade kibana, upgrading the version looks enough. The migration is automatically done and all the configured elements are still available:
root@esnode1:~# curl -s http://10.168.100.61:9200/_cat/indices\?v=true\&s=index | grep kibana health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open .kibana-event-log-7.15.1-000001 24Wb0rfUQuqab3Iody3Hrg 1 1 1 0 12.1kb 6kb <-------- new index green open .kibana-event-log-7.8.0-000001 6IjHICQVS2uX8qBekJLWsw 1 1 2 0 21.4kb 10.7kb green open .kibana_2 Oh9O6uB1R0-oNPbnhTM8kw 1 1 1928 3 1.5mb 788.4kb green open .kibana_7.15.1_001 5fyk6NMUSE-3P6uhx-HSeg 1 1 1110 35 5.3mb 2.6mb <-------- new index (automatically migrated from kibana_2) green open .kibana_task_manager_1 vINZFVqCSJiDHHFMdYGwTA 1 1 5 0 32kb 16kb green open .kibana_task_manager_7.15.1_001 pYeR_zFdTZO_jqxYS1DB9g 1 1 16 369 527kb 277.5kb <-------- new index
root@esnode1:~# curl -s http://10.168.100.61:9200/_cat/aliases\?v=true\&s=index | grep kibana alias index filter routing.index routing.search is_write_index .kibana-event-log-7.15.1 .kibana-event-log-7.15.1-000001 - - - true .kibana-event-log-7.8.0 .kibana-event-log-7.8.0-000001 - - - true .kibana .kibana_7.15.1_001 - - - - .kibana_7.15.1 .kibana_7.15.1_001 - - - - .kibana_task_manager .kibana_task_manager_7.15.1_001 - - - - .kibana_task_manager_7.15.1 .kibana_task_manager_7.15.1_001 - - - -
Nov 8 2021
The migration of ES can be performed with:
- elasticsearch migration
In order to validate the kibana upgrade, the kibana configuration can be copied locally with these commands:
The preparation of the migration through the vagrant environment is in progress.
Thanks for the info.
For the record, the entry point of the upgrade process: https://www.elastic.co/guide/en/elastic-stack/current/upgrading-elastic-stack.html
The basic auth is working with the default remote implementation if the credentials are specified in the url like https://user:password@objstorage_url
not on the cassandra side
Nov 5 2021
Nov 4 2021
Current status:
Following the last discussions, the current track I'm trying to implement is to create a grafana dashboard displaying the current status of the infrastructure.
To do so, some information managed by grafana should be displayed like the end-to-end checks status.
After some tweaks[1], the git bare cooking is finally working correctly on g5k.
Good point :) I forgot to update the service urls.
Nov 3 2021
remove useless intermediate variable usage
The server was rebooted so the ECC counters were reset and the alert closed.
We will check if the error occurs again before asking for a replacement of the memory module by dell.
Nov 2 2021
rebase
The corrupted / missing contents are also identified in T75 so these errors can be ignored
When I tried to generate cook some vault in productions, several error where logged.
Some on missing commit, for example:
missing commit 338fb3eff7e93e121e7fe347b391a100ed0003c5 missing commit 73bc01d26ae7d61a74108b620e32458ef30c75c6 missing commit 78272793666b7354f6be373a73aa62426116d884 missing commit 83f434d5f256886fcfcfc354c3a63b91a71fce63
All these hash are identified in the work on T75 so they are not blockers
Oct 22 2021
The diff (D6448) has been updated to support a basic authentication for the public part. The internal access will remain possible without any authentication.
fix a typo
Add basic authentication support and activate it in staging too
(main description will be updated accordingly)
rebase
As the new kafka is active since a couple of day, a mirror was restored from it and everything seems to be ok, journal0(stopped since the migration is done) was removed from proxmox.
arf wrong issue, restoring statuses
EDITED: removed, commented on the wrong taks
Oct 21 2021
One comment inline, but otherwise it's all good. Thanks
The deployment scripts to deploy the vault and the associated components have to be adapted to be deployed on the grid5000 cluster. work in progress...