Page MenuHomeSoftware Heritage

Ship more logs to logstash/elasticsearch
Started, Work in Progress, NormalPublic

Description

Some of our infrastructure (I'm looking at you Apache) directly logs to files, and is awkward to make log to syslog/journald.

It'd be nice to have those logs shipped to elasticsearch, so we can minimize local disk usage on our infrastructure, and minimize the time we spend looking for logs when something goes wrong.

The canonical log shipper for the Logstash stack seems to be https://www.elastic.co/products/beats; most notably "filebeat". elastic provides an apt repository with their "beats", documentation at https://www.elastic.co/guide/en/beats/libbeat/current/setup-repositories.html.

Event Timeline

olasd created this task.Oct 3 2017, 12:47 PM
ftigeot claimed this task.Apr 13 2018, 11:18 AM
ftigeot changed the task status from Open to Work in Progress.Jul 25 2018, 3:48 PM

Journalbeat installed on moma in order to ship some Apache logs to the logstash instance on banco.
Logstash configuration changed to put raw Apache logs in apache_logs-$date indexes.

Filebeat configuration on moma:/etc/filebeat/filebeat.yml

filebeat.prospectors:
- type: log
  paths:
    - /var/log/apache2/archive.softwareheritage.org_non-ssl_access.log
  fields: { apache_log_type: access_log }

output.logstash:
  hosts: ["logstash.internal.softwareheritage.org:5044"]

Logstash output.conf configuration

output {
    if [fields][apache_log_type] == "access_log" {
        elasticsearch {
                hosts => [
                        "esnode1.internal.softwareheritage.org:9200",
                        "esnode2.internal.softwareheritage.org:9200"
                ]
                index => "apache_logs-%{+YYYY.MM.dd}"
        }
    } else if "swh-worker@" in [systemd_unit] {
        elasticsearch {
                hosts => [
                        "esnode1.internal.softwareheritage.org:9200",
                        "esnode2.internal.softwareheritage.org:9200"
                ]
                index => "swh_workers-%{+YYYY.MM.dd}"
        }
    } else {
        elasticsearch {
                hosts => [
                        "esnode1.internal.softwareheritage.org:9200",
                        "esnode2.internal.softwareheritage.org:9200"
                ]
                index => "systemlogs-%{+YYYY.MM.dd}"
        }
    }
}

Logstash filter configuration

filter {
  if [fields][apache_log_type] == "access_log" {
      grok {
        match => { "message" => "%{COMBINEDAPACHELOG}" }
      }
      date {
        match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
      }
  }
}
ftigeot added a comment.EditedJul 26 2018, 2:41 PM

Now, this configuration works perfectly on my workstation running logstash-6.3.2-1 but for some reason, the existing logstash instance running on banco.internal.softwareheritage.org is unable to filter Apache logs.

ftigeot changed the status of subtask T1160: Create a dedicated logstash VM from Open to Work in Progress.Jul 27 2018, 4:58 PM

With the new VM and its logstash-6.3.2 service,

/var/log/apache2/archive.softwareheritage.org_non-ssl_access.log

contents are now successfully stored into Elasticsearch indexes.