The issue was not visible on an Azure VM but a Proxmox one.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jul 27 2018
Content of $facts['os'] on the problematic host:
os => {"name"=>"Debian", "family"=>"Debian", "release"=>{"major"=>"9", "minor"=>"5", "full"=>"9.5"}, "lsb"=>{"distcodename"=>"stretch", "distid"=>"Debian", "distdescription"=>"Debian GNU/Linux 9.5 (stretch)", "distrelease"=>"9.5", "majdistrelease"=>"9", "minordistrelease"=>"5"}}
It is caused by this piece of code in site-modules/profile/manifests/prometheus/apt_config.pp
if $facts['os']['distro']['codename'] == 'stretch' {
Jul 26 2018
Now, this configuration works perfectly on my workstation running logstash-6.3.2-1 but for some reason, the existing logstash instance running on banco.internal.softwareheritage.org is unable to filter Apache logs.
Logstash filter configuration
Jul 25 2018
Logstash output.conf configuration
Filebeat configuration on moma:/etc/filebeat/filebeat.yml
Journalbeat installed on moma in order to ship some Apache logs to the logstash instance on banco.
Logstash configuration changed to put raw Apache logs in apache_logs-$date indexes.
Jul 23 2018
Jul 19 2018
Jul 4 2018
Some logstash indexes cannot be reindexed.
Part of the Elasticsearch error message:
"mapper_parsing_exception","reason":"failed to parse [swh_logging_args_return_value]","caused_by":{"type":"illegal_argument_exception","reason":"Failed to parse value [None] as only [true] or [false] are allowed."}}
The D11 and DS11 VM sizes are not very interesting.
Among the rest, the existing VM can only be resized to an e2s_v3 model.
All remaining non-swh-worker logs deleted from legacy logstash-* indexes.
Public IP address deassociated from dbreplica1 and then deleted via the Azure portal interface today.
Jul 3 2018
Possible Azure VM types with 14-16GB RAM available on the Euwest zone:
- a2mv2 16GB RAM $90.52/month
- E2v3 16GB RAM $116.80/month
- E2sv3 16GB RAM $116.80/month
- D11v2 14GB RAM $138.70/month
- DS11v2 14GB RAM $138.70/month
dbreplica1.euwest.azure.internal.softwareheritage.org runs a slave instance of the softwareheritage-indexer database, with its master running on somerset.internal.softwareheritage.org.
Jul 2 2018
Jun 28 2018
Jun 26 2018
https://webapp0.softwareheritage.org/browse/search/ does not answer quickly when entering a new keyword for the first time.
Tried "danana" => got answers after more than one minute.
Also tried "zarg" => the wheel stopped spinning after one minute, with no explanation. A text message saying the pattern was not found could have been helpful.
Jun 25 2018
Jun 22 2018
Jun 21 2018
It seems like deleting old documents takes a heavy toll on the cluster.
So far, for every month of old logstash indexes cleaned, at least one node member started to misbehave and had to be restarted after excessive timeouts and/or other issues including constant garbage collection and disk trashing.
Even though all delete requests were previously successfully processed, non-swh-workers data remain in the legacy logstash-* indexes.
This is not an entirely unexpected behavior. It is possible resource limitations prevented the old Banco node from processing all deletion requests in a bounded time frame.
Deletion queries will be rerun index by index in this way:
curl -i -H'Content-Type: application/json' \ -XPOST "http://esnode2.internal.softwareheritage.org:9200/logstash-2018.02.31/_delete_by_query?pretty=true" -d ' { "query" : { "bool" : { "must_not" : [{ "match" : { "systemd_unit" : "swh-worker@" }}] }} }'
Jun 20 2018
The swh_workers-2018.03.07 index contained non-swh-workers documents and was cleaned this way:
curl -i -H'Content-Type: application/json' \ -XPOST "http://esnode3.internal.softwareheritage.org:9200/swh_workers-2018.03.07/_delete_by_query?pretty=true" -d ' { "query" : { "bool" : { "must_not" : [{ "match" : { "systemd_unit" : "swh-worker@" }}] }} }'
Jun 18 2018
Jun 15 2018
Jun 14 2018
Jun 13 2018
Jun 12 2018
Jun 8 2018
Possible solutions may include:
- First installing a dummy nginx configuration file
- Change Debian's service startup policy with `/usr/sbin/policy-rc.d```
- Stop using Varnish locally and replace it with Azure caching services
Jun 6 2018
Jun 5 2018
May 28 2018
Some logstash-xxx indexes appear to still contain non-swh_workers related information.
Blindly reindexing them according to this issue description will not be enough.
The cluster now has three brand new nodes; all existing data has been copied and the original Banco node removed.
Issue technically fixed but left open in order to track related Elasticsearch requests.