Page MenuHomeSoftware Heritage
Feed Advanced Search

Jul 27 2018

ftigeot changed the status of T1160: Create a dedicated logstash VM, a subtask of T1126: Move away non-gunicorn services from banco, from Open to Work in Progress.
Jul 27 2018, 4:58 PM · System administration
ftigeot closed T1162: Prometheus manifest breaks puppet on brand new nodes as Resolved.

The issue was not visible on an Azure VM but a Proxmox one.

Jul 27 2018, 4:56 PM · System administration
ftigeot committed rSPSITE78742ed60210: manifests/prometheus: Simplify a facter test (authored by ftigeot).
manifests/prometheus: Simplify a facter test
Jul 27 2018, 4:51 PM
ftigeot committed rSPSITEb04f88701f8e: data/defaults.yaml: petitpalais has a new IP address (authored by ftigeot).
data/defaults.yaml: petitpalais has a new IP address
Jul 27 2018, 4:35 PM
ftigeot added a comment to T1162: Prometheus manifest breaks puppet on brand new nodes.

Content of $facts['os'] on the problematic host:

os => {"name"=>"Debian", "family"=>"Debian", "release"=>{"major"=>"9", "minor"=>"5", "full"=>"9.5"}, "lsb"=>{"distcodename"=>"stretch", "distid"=>"Debian", "distdescription"=>"Debian GNU/Linux 9.5 (stretch)", "distrelease"=>"9.5", "majdistrelease"=>"9", "minordistrelease"=>"5"}}
Jul 27 2018, 4:33 PM · System administration
ftigeot added a comment to T1162: Prometheus manifest breaks puppet on brand new nodes.

It is caused by this piece of code in site-modules/profile/manifests/prometheus/apt_config.pp

if $facts['os']['distro']['codename'] == 'stretch' {
Jul 27 2018, 4:31 PM · System administration
ftigeot created T1162: Prometheus manifest breaks puppet on brand new nodes.
Jul 27 2018, 4:29 PM · System administration
ftigeot committed rSPSITE0f298cc000d5: data/defaults.yaml: Add logstash0.internal.softwareheritage.org (authored by ftigeot).
data/defaults.yaml: Add logstash0.internal.softwareheritage.org
Jul 27 2018, 1:59 PM

Jul 26 2018

ftigeot added a subtask for T791: Ship more logs to logstash/elasticsearch: T1160: Create a dedicated logstash VM.
Jul 26 2018, 3:44 PM · System administration
ftigeot added parent tasks for T1160: Create a dedicated logstash VM: T791: Ship more logs to logstash/elasticsearch, T1126: Move away non-gunicorn services from banco.
Jul 26 2018, 3:44 PM · System administration
ftigeot added a subtask for T1126: Move away non-gunicorn services from banco: T1160: Create a dedicated logstash VM.
Jul 26 2018, 3:44 PM · System administration
ftigeot triaged T1160: Create a dedicated logstash VM as High priority.
Jul 26 2018, 3:43 PM · System administration
ftigeot added a comment to T791: Ship more logs to logstash/elasticsearch.

Now, this configuration works perfectly on my workstation running logstash-6.3.2-1 but for some reason, the existing logstash instance running on banco.internal.softwareheritage.org is unable to filter Apache logs.

Jul 26 2018, 2:41 PM · System administration
ftigeot added a comment to T791: Ship more logs to logstash/elasticsearch.

Logstash filter configuration

Jul 26 2018, 2:37 PM · System administration

Jul 25 2018

ftigeot added a comment to T791: Ship more logs to logstash/elasticsearch.

Logstash output.conf configuration

Jul 25 2018, 4:03 PM · System administration
ftigeot added a comment to T791: Ship more logs to logstash/elasticsearch.

Filebeat configuration on moma:/etc/filebeat/filebeat.yml

Jul 25 2018, 3:49 PM · System administration
ftigeot changed the status of T791: Ship more logs to logstash/elasticsearch from Open to Work in Progress.

Journalbeat installed on moma in order to ship some Apache logs to the logstash instance on banco.
Logstash configuration changed to put raw Apache logs in apache_logs-$date indexes.

Jul 25 2018, 3:48 PM · System administration

Jul 23 2018

ftigeot committed rSPSITE613377fcc1de: Add a lsi adapter role in order to install megacli (authored by ftigeot).
Add a lsi adapter role in order to install megacli
Jul 23 2018, 11:46 AM

Jul 19 2018

ftigeot committed rSPSITE227619336585: webapp: make sure filebeat is present (authored by ftigeot).
webapp: make sure filebeat is present
Jul 19 2018, 5:19 PM

Jul 4 2018

ftigeot added a comment to T1000: Reindex old data on banco to put it into swh_worker indexes.

Some logstash indexes cannot be reindexed.
Part of the Elasticsearch error message:

"mapper_parsing_exception","reason":"failed to parse [swh_logging_args_return_value]","caused_by":{"type":"illegal_argument_exception","reason":"Failed to parse value [None] as only [true] or [false] are allowed."}}
Jul 4 2018, 3:42 PM · System administration
ftigeot added a comment to T1127: dbreplica1 2018-06-30 event postmortem.

The D11 and DS11 VM sizes are not very interesting.
Among the rest, the existing VM can only be resized to an e2s_v3 model.

Jul 4 2018, 11:51 AM · Web app, System administration
ftigeot closed T977: Delete old system log data from the Elasticsearch cluster as Resolved.

All remaining non-swh-worker logs deleted from legacy logstash-* indexes.

Jul 4 2018, 11:22 AM · System administration (Elasticsearch consolidation (W24/2018))
ftigeot closed T977: Delete old system log data from the Elasticsearch cluster, a subtask of T792: Make the elasticsearch logging cluster actually a cluster, as Resolved.
Jul 4 2018, 11:22 AM · System administration (Elasticsearch consolidation (W24/2018))
ftigeot closed T1129: Do not assign public IP addresses to Azure VM by default as Resolved.

Fixed by @olasd in https://forge.softwareheritage.org/rSPREd529d3b26a01a6643cf6eed2920a25b08317e571#change-6Ckng2kSodym .

Jul 4 2018, 11:20 AM · System administration
ftigeot closed T1128: Remove the public IP address on dbreplica1, a subtask of T1069: fully host the web UI on Azure, as Resolved.
Jul 4 2018, 11:12 AM · Web app, System administration
ftigeot closed T1128: Remove the public IP address on dbreplica1 as Resolved.

Public IP address deassociated from dbreplica1 and then deleted via the Azure portal interface today.

Jul 4 2018, 11:12 AM · Web app, System administration

Jul 3 2018

ftigeot added a comment to T1127: dbreplica1 2018-06-30 event postmortem.

Possible Azure VM types with 14-16GB RAM available on the Euwest zone:

  • a2mv2 16GB RAM $90.52/month
  • E2v3 16GB RAM $116.80/month
  • E2sv3 16GB RAM $116.80/month
  • D11v2 14GB RAM $138.70/month
  • DS11v2 14GB RAM $138.70/month
Jul 3 2018, 4:53 PM · Web app, System administration
ftigeot triaged T1129: Do not assign public IP addresses to Azure VM by default as Normal priority.
Jul 3 2018, 4:51 PM · System administration
ftigeot triaged T1128: Remove the public IP address on dbreplica1 as High priority.
Jul 3 2018, 4:47 PM · Web app, System administration
ftigeot added a comment to T1127: dbreplica1 2018-06-30 event postmortem.

dbreplica1.euwest.azure.internal.softwareheritage.org runs a slave instance of the softwareheritage-indexer database, with its master running on somerset.internal.softwareheritage.org.

Jul 3 2018, 4:45 PM · Web app, System administration
ftigeot triaged T1127: dbreplica1 2018-06-30 event postmortem as Normal priority.
Jul 3 2018, 4:39 PM · Web app, System administration

Jul 2 2018

ftigeot created T1126: Move away non-gunicorn services from banco.
Jul 2 2018, 10:46 AM · System administration

Jun 28 2018

ftigeot committed rSPSITEe3cf967f1487: icinga2: Monitor the Elasticsearch cluster (authored by ftigeot).
icinga2: Monitor the Elasticsearch cluster
Jun 28 2018, 3:53 PM

Jun 26 2018

ftigeot added a comment to T1116: Azure webapp performance tests.

https://webapp0.softwareheritage.org/browse/search/ does not answer quickly when entering a new keyword for the first time.
Tried "danana" => got answers after more than one minute.
Also tried "zarg" => the wheel stopped spinning after one minute, with no explanation. A text message saying the pattern was not found could have been helpful.

Jun 26 2018, 11:33 AM · Web app, System administration

Jun 25 2018

ftigeot committed rSPSITE5d8ae7f39ae5: elasticsearch: Stop using mmap(2) (authored by ftigeot).
elasticsearch: Stop using mmap(2)
Jun 25 2018, 5:01 PM

Jun 22 2018

ftigeot committed rSPSITEecafdeca66c5: elasticsearch: Allow up to 128K open files (authored by ftigeot).
elasticsearch: Allow up to 128K open files
Jun 22 2018, 3:24 PM

Jun 21 2018

ftigeot added a comment to T977: Delete old system log data from the Elasticsearch cluster.

It seems like deleting old documents takes a heavy toll on the cluster.
So far, for every month of old logstash indexes cleaned, at least one node member started to misbehave and had to be restarted after excessive timeouts and/or other issues including constant garbage collection and disk trashing.

Jun 21 2018, 4:32 PM · System administration (Elasticsearch consolidation (W24/2018))
ftigeot added a comment to T977: Delete old system log data from the Elasticsearch cluster.

Even though all delete requests were previously successfully processed, non-swh-workers data remain in the legacy logstash-* indexes.
This is not an entirely unexpected behavior. It is possible resource limitations prevented the old Banco node from processing all deletion requests in a bounded time frame.
Deletion queries will be rerun index by index in this way:

 curl -i -H'Content-Type: application/json' \
     -XPOST "http://esnode2.internal.softwareheritage.org:9200/logstash-2018.02.31/_delete_by_query?pretty=true" -d '
{
	"query" : { "bool" : { "must_not" : [{ "match" : { "systemd_unit" : "swh-worker@" }}] }}
}'
Jun 21 2018, 11:27 AM · System administration (Elasticsearch consolidation (W24/2018))
ftigeot committed rSPSITEe9bf7864ae02: elasticsearch: Pin to version 6.3.0 (authored by ftigeot).
elasticsearch: Pin to version 6.3.0
Jun 21 2018, 10:51 AM

Jun 20 2018

ftigeot added a comment to T977: Delete old system log data from the Elasticsearch cluster.

The swh_workers-2018.03.07 index contained non-swh-workers documents and was cleaned this way:

curl -i -H'Content-Type: application/json' \
     -XPOST "http://esnode3.internal.softwareheritage.org:9200/swh_workers-2018.03.07/_delete_by_query?pretty=true" -d '
{
	"query" : { "bool" : { "must_not" : [{ "match" : { "systemd_unit" : "swh-worker@" }}] }}
}'
Jun 20 2018, 11:48 AM · System administration (Elasticsearch consolidation (W24/2018))
ftigeot changed the status of T1000: Reindex old data on banco to put it into swh_worker indexes from Open to Work in Progress.
Jun 20 2018, 11:14 AM · System administration
ftigeot changed the status of T1000: Reindex old data on banco to put it into swh_worker indexes, a subtask of T792: Make the elasticsearch logging cluster actually a cluster, from Open to Work in Progress.
Jun 20 2018, 11:14 AM · System administration (Elasticsearch consolidation (W24/2018))
ftigeot triaged T987: Add an Icinga alert for high queue levels on saatchi as Low priority.
Jun 20 2018, 10:50 AM · System administration
ftigeot claimed T792: Make the elasticsearch logging cluster actually a cluster.
Jun 20 2018, 10:43 AM · System administration (Elasticsearch consolidation (W24/2018))

Jun 18 2018

ftigeot triaged T1108: Excessive memory usage on storage0.euwest.azure.internal.softwareheritage.org as High priority.
Jun 18 2018, 2:43 PM · System administration

Jun 15 2018

ftigeot committed rSPSITE2310e8b12265: elasticsearch: Use the Elasticsearch module (authored by ftigeot).
elasticsearch: Use the Elasticsearch module
Jun 15 2018, 2:36 PM
ftigeot committed rSPSITE6fb2bf353ad9: elasticsearch: Add package installation data (authored by ftigeot).
elasticsearch: Add package installation data
Jun 15 2018, 2:31 PM
ftigeot committed rSPSITE6e35c0da6496: icinga2: Add check_elasticsearch_cluster.sh to agent checks (authored by ftigeot).
icinga2: Add check_elasticsearch_cluster.sh to agent checks
Jun 15 2018, 2:30 PM
ftigeot committed rSPSITEfcfa588e5c54: Manifests: Add an elasticsearch profile (authored by ftigeot).
Manifests: Add an elasticsearch profile
Jun 15 2018, 2:30 PM
ftigeot committed rSPSITEa36846b8b6d8: Revert "icinga2: Add check_elasticsearch_cluster.sh to agent checks" (authored by ftigeot).
Revert "icinga2: Add check_elasticsearch_cluster.sh to agent checks"
Jun 15 2018, 2:30 PM
ftigeot added a reverting change for rSPSITE6e35c0da6496: icinga2: Add check_elasticsearch_cluster.sh to agent checks: rSPSITEa36846b8b6d8: Revert "icinga2: Add check_elasticsearch_cluster.sh to agent checks".
Jun 15 2018, 2:30 PM
ftigeot committed rSPSITEe1b3b449af6b: icinga2: Add a custom Elasticsearch cluster check (authored by ftigeot).
icinga2: Add a custom Elasticsearch cluster check
Jun 15 2018, 2:30 PM
ftigeot committed rSPSITE10c8a51dd87b: rsyslog: Add some whitespace (authored by ftigeot).
rsyslog: Add some whitespace
Jun 15 2018, 2:30 PM
ftigeot committed rSPSITE8c7a4af1adde: rsyslog: Match lines independently of whitespace (authored by ftigeot).
rsyslog: Match lines independently of whitespace
Jun 15 2018, 2:30 PM
ftigeot committed rSPSITEc05e0b64acd7: swh-profile: Create rsyslog profile (authored by ftigeot).
swh-profile: Create rsyslog profile
Jun 15 2018, 2:30 PM
ftigeot committed rSPSITE38d5e2940fe0: Revert a4ac4669eff6332e1f69c1211c1db26cb5e8d207 systemd_journal: Limit size to… (authored by ftigeot).
Revert a4ac4669eff6332e1f69c1211c1db26cb5e8d207 systemd_journal: Limit size to…
Jun 15 2018, 2:30 PM
ftigeot added a reverting change for rSPSITEa4ac4669eff6: systemd_journal: Limit size to 400MB: rSPSITE38d5e2940fe0: Revert a4ac4669eff6332e1f69c1211c1db26cb5e8d207 systemd_journal: Limit size to….
Jun 15 2018, 2:30 PM
ftigeot committed rSPSITEcc77d4c4d913: systemd_journal: Fix indentation (authored by ftigeot).
systemd_journal: Fix indentation
Jun 15 2018, 2:30 PM
ftigeot committed rSPSITEa4ac4669eff6: systemd_journal: Limit size to 400MB (authored by ftigeot).
systemd_journal: Limit size to 400MB
Jun 15 2018, 2:30 PM
ftigeot committed rSPSITE70fe9f1c2836: Roles: Add an Elasticsearch manifest (authored by ftigeot).
Roles: Add an Elasticsearch manifest
Jun 15 2018, 2:28 PM
ftigeot committed rSPSITE39adab65f25b: Add swh_api_azure role (authored by ftigeot).
Add swh_api_azure role
Jun 15 2018, 2:28 PM
ftigeot committed rSPSITEa61fb89f87db: roles/swh_server: Add rsyslog profile (authored by ftigeot).
roles/swh_server: Add rsyslog profile
Jun 15 2018, 2:28 PM

Jun 14 2018

ftigeot committed rSENV95bfd55903cc: Add richardc-datacat Puppet module (authored by ftigeot).
Add richardc-datacat Puppet module
Jun 14 2018, 5:17 PM
ftigeot committed rSPPROF2310e8b12265: elasticsearch: Use the Elasticsearch module (authored by ftigeot).
elasticsearch: Use the Elasticsearch module
Jun 14 2018, 5:15 PM
ftigeot committed rSPSITE897ed75b67f1: Add richardc/datacat (authored by ftigeot).
Add richardc/datacat
Jun 14 2018, 4:06 PM
ftigeot committed rSPSITE3b39226ed68f: Add elastic/elasticsearch module (authored by ftigeot).
Add elastic/elasticsearch module
Jun 14 2018, 3:43 PM
ftigeot committed rSENVcbb79660eef0: Add elastic-elasticsearch Puppet module (authored by ftigeot).
Add elastic-elasticsearch Puppet module
Jun 14 2018, 3:39 PM
ftigeot committed rSENVbd13289d576b: Update facts (authored by ftigeot).
Update facts
Jun 14 2018, 3:11 PM

Jun 13 2018

ftigeot changed the status of T793: Move elasticsearch log cluster configuration inside puppet, a subtask of T792: Make the elasticsearch logging cluster actually a cluster, from Open to Work in Progress.
Jun 13 2018, 5:04 PM · System administration (Elasticsearch consolidation (W24/2018))
ftigeot changed the status of T793: Move elasticsearch log cluster configuration inside puppet from Open to Work in Progress.
Jun 13 2018, 5:04 PM · System administration (Elasticsearch consolidation (W24/2018))
ftigeot committed rSPPROF6fb2bf353ad9: elasticsearch: Add package installation data (authored by ftigeot).
elasticsearch: Add package installation data
Jun 13 2018, 3:04 PM
ftigeot committed rSPSITEeeeaceecabbb: data/defaults: Add Elasticsearch apt key (authored by ftigeot).
data/defaults: Add Elasticsearch apt key
Jun 13 2018, 3:01 PM

Jun 12 2018

ftigeot claimed T704: beautify wiki URLs.
Jun 12 2018, 4:56 PM · Easy hack, System administration
ftigeot committed rSPPROFa36846b8b6d8: Revert "icinga2: Add check_elasticsearch_cluster.sh to agent checks" (authored by ftigeot).
Revert "icinga2: Add check_elasticsearch_cluster.sh to agent checks"
Jun 12 2018, 4:29 PM
ftigeot added a reverting change for rSPPROF6e35c0da6496: icinga2: Add check_elasticsearch_cluster.sh to agent checks: rSPPROFa36846b8b6d8: Revert "icinga2: Add check_elasticsearch_cluster.sh to agent checks".
Jun 12 2018, 4:29 PM
ftigeot committed rSPPROF6e35c0da6496: icinga2: Add check_elasticsearch_cluster.sh to agent checks (authored by ftigeot).
icinga2: Add check_elasticsearch_cluster.sh to agent checks
Jun 12 2018, 3:57 PM
ftigeot committed rSPSITEad55d414664c: site.pp: Fix elasticsearch hostnames (authored by ftigeot).
site.pp: Fix elasticsearch hostnames
Jun 12 2018, 3:32 PM
ftigeot committed rSPSITE37de2cb1d936: site.pp: Start using swh_elasticsearch role (authored by ftigeot).
site.pp: Start using swh_elasticsearch role
Jun 12 2018, 3:19 PM
ftigeot committed rSPROLE70fe9f1c2836: Roles: Add an Elasticsearch manifest (authored by ftigeot).
Roles: Add an Elasticsearch manifest
Jun 12 2018, 3:16 PM
ftigeot committed rSPPROFfcfa588e5c54: Manifests: Add an elasticsearch profile (authored by ftigeot).
Manifests: Add an elasticsearch profile
Jun 12 2018, 2:32 PM

Jun 8 2018

ftigeot committed rSPPROFe1b3b449af6b: icinga2: Add a custom Elasticsearch cluster check (authored by ftigeot).
icinga2: Add a custom Elasticsearch cluster check
Jun 8 2018, 2:18 PM
ftigeot added a comment to T1093: Puppet role swh_api_azure fails to install nginx-lite.

Possible solutions may include:

  • First installing a dummy nginx configuration file
  • Change Debian's service startup policy with `/usr/sbin/policy-rc.d```
  • Stop using Varnish locally and replace it with Azure caching services
Jun 8 2018, 12:00 PM · System administration
ftigeot triaged T1093: Puppet role swh_api_azure fails to install nginx-lite as Normal priority.
Jun 8 2018, 11:50 AM · System administration
ftigeot triaged T1091: objstorage usage on Azure requires python3-swh.objstorage.cloud as Normal priority.
Jun 8 2018, 11:34 AM · System administration
ftigeot triaged T1090: SSH timeouts on new Azure VMs as Normal priority.
Jun 8 2018, 11:17 AM · System administration
ftigeot triaged T1089: Puppet recipes fail to install apt-transport-https on new hosts as Normal priority.
Jun 8 2018, 10:52 AM · System administration
ftigeot triaged T1088: Replace short names by FQDNs in Puppet manifests as Wishlist priority.
Jun 8 2018, 10:23 AM · System administration
ftigeot added a project to T1088: Replace short names by FQDNs in Puppet manifests: System administration.
Jun 8 2018, 10:22 AM · System administration
ftigeot created T1088: Replace short names by FQDNs in Puppet manifests.
Jun 8 2018, 10:08 AM · System administration

Jun 6 2018

ftigeot committed rSPSITE3cb3f79d9ebe: swh_api: Use per-host configuration data (authored by ftigeot).
swh_api: Use per-host configuration data
Jun 6 2018, 1:15 PM
ftigeot committed rSPSITEc9b7fde1d675: Add role api_azure for webapp0 (authored by ftigeot).
Add role api_azure for webapp0
Jun 6 2018, 1:15 PM
ftigeot committed rSPROLE39adab65f25b: Add swh_api_azure role (authored by ftigeot).
Add swh_api_azure role
Jun 6 2018, 12:45 PM

Jun 5 2018

ftigeot committed rSPSITE421b8a26cea5: hostname/webapp0: Add host-specific settings (authored by ftigeot).
hostname/webapp0: Add host-specific settings
Jun 5 2018, 4:59 PM
ftigeot committed rSPSITE78839067e5a6: Revert "data/defaults: Add webapp0.softwareheritage.org" (authored by ftigeot).
Revert "data/defaults: Add webapp0.softwareheritage.org"
Jun 5 2018, 4:24 PM
ftigeot added a reverting change for rSPSITE92ec39dad664: data/defaults: Add webapp0.softwareheritage.org: rSPSITE78839067e5a6: Revert "data/defaults: Add webapp0.softwareheritage.org".
Jun 5 2018, 4:24 PM
ftigeot committed rSPSITE33a4fb747203: Add role swh-api to webapp0 (2/2) (authored by ftigeot).
Add role swh-api to webapp0 (2/2)
Jun 5 2018, 4:11 PM
ftigeot committed rSPSITE2318a664c2fa: Add role swh-api to webapp0 (authored by ftigeot).
Add role swh-api to webapp0
Jun 5 2018, 4:09 PM
ftigeot committed rSPSITE92ec39dad664: data/defaults: Add webapp0.softwareheritage.org (authored by ftigeot).
data/defaults: Add webapp0.softwareheritage.org
Jun 5 2018, 4:09 PM

May 28 2018

ftigeot added a comment to T1000: Reindex old data on banco to put it into swh_worker indexes.

Some logstash-xxx indexes appear to still contain non-swh_workers related information.
Blindly reindexing them according to this issue description will not be enough.

May 28 2018, 2:22 PM · System administration
ftigeot added a comment to T792: Make the elasticsearch logging cluster actually a cluster.

The cluster now has three brand new nodes; all existing data has been copied and the original Banco node removed.
Issue technically fixed but left open in order to track related Elasticsearch requests.

May 28 2018, 11:21 AM · System administration (Elasticsearch consolidation (W24/2018))