Page MenuHomeSoftware Heritage

Improve access_logs parsing
Closed, MigratedEdits Locked

Description

Actually the apache logs are sent to elasticsearch and there some parsing made by logstash [1].
These data can be used to extract create interesting dashboards on grafana and help to diagnose some issue (response time per time of request / response code / ... )

To do so, the logs and the parsing must be improved to :

  • Add the request duration on the log
  • Convert the response code to an integer
  • convert the "bytes" to an integer
  • Add some tags like the application and the environment to help create usable filters

[1] http://kibana0.internal.softwareheritage.org:5601/app/kibana#/discover/doc/f44a4480-9653-11e8-980c-712cea9c3832/apache_logs-2020.11.17?id=UAH81XUBO-oK4hKuO702

{
  "_index": "apache_logs-2020.11.17",
  "_type": "_doc",
  "_id": "UAH81XUBO-oK4hKuO702",
  "_score": 1,
  "_source": {
    "@version": "1",
    "agent": {
      "id": "926d1a92-fb11-4e60-b29e-59550ea0ade8",
      "type": "filebeat",
      "hostname": "moma",
      "name": "moma",
      "version": "7.8.0",
      "ephemeral_id": "e5ad2d14-108b-471b-8a88-f4dddee584ad"
    },
    "ident": "-",
    "httpversion": "1.1",
    "referrer": "\"-\"",
    "host": {
      "name": "moma"
    },
    "log": {
      "file": {
        "path": "/var/log/apache2/archive.softwareheritage.org_non-ssl_access.log"
      },
      "offset": 13209487
    },
    "fields": {
      "apache_log_type": "access_log"
    },
    "ecs": {
      "version": "1.5.0"
    },
    "auth": "-",
    "verb": "GET",
    "input": {
      "type": "log"
    },
    "tags": [
      "beats_input_codec_plain_applied"
    ],
    "@timestamp": "2020-11-17T11:34:37.000Z",
    "timestamp": "17/Nov/2020:11:34:37 +0000",
    "request": "/browse/revision/dde8d9775f9fe122dfeb03c3fd736118e1062887/",
    "message": "::1 - - [17/Nov/2020:11:34:37 +0000] \"GET /browse/revision/dde8d9775f9fe122dfeb03c3fd736118e1062887/ HTTP/1.1\" 200 8378 \"-\" \"Mozilla/5.0 (compatible; SemrushBot/6~bl; +http://www.semrush.com/bot.html)\"",
    "response": "200",
    "bytes": "8378",
    "clientip": "::1"
  },
  "fields": {
    "@timestamp": [
      "2020-11-17T11:34:37.000Z"
    ]
  }
}

Event Timeline

vsellier triaged this task as Normal priority.Nov 17 2020, 12:36 PM
vsellier created this task.
vsellier changed the task status from Open to Work in Progress.Jan 29 2021, 2:33 PM
vsellier moved this task from Backlog to in-progress on the System administration board.

Configuration deployed for the webapp on all servers, the logs have now the duration, which is parsed on the elasticseach entries :

for example :

::1 - - [02/Feb/2021:14:31:10 +0000] "GET /browse/revision/faa6da70918c0107747c1ab149d32cdfd503cb87/?branch=refs/heads/master&origin_url=https://github.com/Naima16/dbd.git&snapshot=3cd6c78ff19a58919102a5fe0af7aba3e5d16620 HTTP/1.1" 200 9052 "-" "Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)" 191

-> http://kibana0.internal.softwareheritage.org:5601/app/kibana#/discover/doc/f44a4480-9653-11e8-980c-712cea9c3832/apache_logs-2021.02.02?id=TGYpY3cBAmYkSRAjZdsZ

{
  "_index": "apache_logs-2021.02.02",
  "_type": "_doc",
  "_id": "TGYpY3cBAmYkSRAjZdsZ",
...
    "response": 200,
...
    "duration_ms": 154
...
}

Let's add this on the deposit too now

The log parsing is ok.
An elasticsearch datasource was created on grafana so we can now create some graphs based on the logs on elasticsearch.
A simple dashboard to display some statistics based on the apache log was initiated[1], it appears the design is not as simple as in kibana and have some limitations but it still allows to have basic information centralized in grafana.

Now, I will check if migrating the old apache indexes can be worth it (converting the response/bytes fields to integer)

[1] https://grafana.softwareheritage.org/goto/pbF1qUYMz

The opened apache indexes are currently being migrated with the P940's script.

The script is manually launched by batch with a command like :

# curl -s -XGET $ES_SERVER/_cat/indices\?s=index\&h=index,status\&v | grep -e "apache.*open" | cut -f1 -d" " | grep "2020.09.[23]" | xargs -r -t -n1 ./convert-fields.sh

During the tests, we found the old closed indexes seems to not work correctly when they are reopened.

For example for the oldest index apache_logs-2018.07.31:

curl -s $ES_SERVER/apache_logs-2018.07.31/_stats\?pretty | jq '._all.primaries.docs'
{
  "count": 39984,
  "deleted": 0
}

A search returns nothing :

% curl $ES_SERVER/apache_logs-2018.07.31/_search\?pretty                              
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 0,
    "successful" : 0,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [ ]
  }
}

A flush or a merge does not change anything.

The impacted indexes are from 2018-07-31 to 2020-05-11.
We have decided to ignore them from the migration and we'll probably remove them later.

vsellier closed this task as Resolved.EditedFeb 5 2021, 9:09 AM

It seems there were some huge queries the last few days [1], the script needed to be adapted to use Long instead of Integers :

apache_logs-2021.01.14:
{
  "error" : {
    "root_cause" : [
      {
        "type" : "script_exception",
        "reason" : "runtime error",
        "script_stack" : [
          "java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:68)",
          "java.base/java.lang.Integer.parseInt(Integer.java:652)",
          "java.base/java.lang.Integer.parseInt(Integer.java:770)",
          "ctx._source.bytes = ctx._source.bytes instanceof java.lang.String ? Integer.parseInt(ctx._source.bytes) : ctx._source.bytes; ",
          "                                                                                                ^---- HERE"
        ],
        "script" : "ctx._source.bytes = ctx._source.bytes instanceof java.lang.String ? Integer.parseInt(ctx._source.bytes) : ctx._source.bytes; ctx._source.response = ctx._source.response instanceof java.lang.String ? Integer.parseInt(ctx._source.response) : ctx._source.response;",
        "lang" : "painless",
        "position" : {
          "offset" : 96,
          "start" : 0,
          "end" : 125
        }
      }
    ],
    "type" : "script_exception",
    "reason" : "runtime error",
    "script_stack" : [
      "java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:68)",
      "java.base/java.lang.Integer.parseInt(Integer.java:652)",
      "java.base/java.lang.Integer.parseInt(Integer.java:770)",
      "ctx._source.bytes = ctx._source.bytes instanceof java.lang.String ? Integer.parseInt(ctx._source.bytes) : ctx._source.bytes; ",
      "                                                                                                ^---- HERE"
    ],
    "script" : "ctx._source.bytes = ctx._source.bytes instanceof java.lang.String ? Integer.parseInt(ctx._source.bytes) : ctx._source.bytes; ctx._source.response = ctx._source.response instanceof java.lang.String ? Integer.parseInt(ctx._source.response) : ctx._source.response;",
    "lang" : "painless",
    "position" : {
      "offset" : 96,
      "start" : 0,
      "end" : 125
    },
    "caused_by" : {
      "type" : "number_format_exception",
      "reason" : "For input string: \"4633815064\""
    }
  },
  "status" : 400
}



--------------------
apache_logs-2021.01.19

    },
    "caused_by" : {
      "type" : "number_format_exception",
      "reason" : "For input string: \"8936007354\""
    }
  },


----------------------
apache_logs-2021.01.21

    "caused_by" : {
      "type" : "number_format_exception",
      "reason" : "For input string: \"19281658245\""
    }
  },
  "status" : 400

----------------------
apache_logs-2021.01.22

    "caused_by" : {
      "type" : "number_format_exception",
      "reason" : "For input string: \"18800549679\""
    }
  },
  "status" : 400

All the indices were correctly migrated except this point.

[1] example of such queries: https://grafana.softwareheritage.org/goto/xY-RtuYMz