Page MenuHomeSoftware Heritage

Improve access_logs parsing
Open, NormalPublic

Description

Actually the apache logs are sent to elasticsearch and there some parsing made by logstash [1].
These data can be used to extract create interesting dashboards on grafana and help to diagnose some issue (response time per time of request / response code / ... )

To do so, the logs and the parsing must be improved to :

  • Add the request duration on the log
  • Convert the response code to an integer
  • convert the "bytes" to an integer
  • Add some tags like the application and the environment to help create usable filters

[1] http://kibana0.internal.softwareheritage.org:5601/app/kibana#/discover/doc/f44a4480-9653-11e8-980c-712cea9c3832/apache_logs-2020.11.17?id=UAH81XUBO-oK4hKuO702

{
  "_index": "apache_logs-2020.11.17",
  "_type": "_doc",
  "_id": "UAH81XUBO-oK4hKuO702",
  "_score": 1,
  "_source": {
    "@version": "1",
    "agent": {
      "id": "926d1a92-fb11-4e60-b29e-59550ea0ade8",
      "type": "filebeat",
      "hostname": "moma",
      "name": "moma",
      "version": "7.8.0",
      "ephemeral_id": "e5ad2d14-108b-471b-8a88-f4dddee584ad"
    },
    "ident": "-",
    "httpversion": "1.1",
    "referrer": "\"-\"",
    "host": {
      "name": "moma"
    },
    "log": {
      "file": {
        "path": "/var/log/apache2/archive.softwareheritage.org_non-ssl_access.log"
      },
      "offset": 13209487
    },
    "fields": {
      "apache_log_type": "access_log"
    },
    "ecs": {
      "version": "1.5.0"
    },
    "auth": "-",
    "verb": "GET",
    "input": {
      "type": "log"
    },
    "tags": [
      "beats_input_codec_plain_applied"
    ],
    "@timestamp": "2020-11-17T11:34:37.000Z",
    "timestamp": "17/Nov/2020:11:34:37 +0000",
    "request": "/browse/revision/dde8d9775f9fe122dfeb03c3fd736118e1062887/",
    "message": "::1 - - [17/Nov/2020:11:34:37 +0000] \"GET /browse/revision/dde8d9775f9fe122dfeb03c3fd736118e1062887/ HTTP/1.1\" 200 8378 \"-\" \"Mozilla/5.0 (compatible; SemrushBot/6~bl; +http://www.semrush.com/bot.html)\"",
    "response": "200",
    "bytes": "8378",
    "clientip": "::1"
  },
  "fields": {
    "@timestamp": [
      "2020-11-17T11:34:37.000Z"
    ]
  }
}

Event Timeline

vsellier triaged this task as Normal priority.Tue, Nov 17, 12:36 PM
vsellier created this task.