A basic dashboard [1] is created on garfana based on the number of log line.
It's too limited as it's not possible to isolate the logs per environment as the information is not available.
It will be added in T3043
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Feb 15 2021
Feb 12 2021
Fix tests to increase code coverage
The not_found status is now only set when the repository is really not found.
In D5064#127855, @vlorentz wrote:It also misses a test where an origin is initially inserted with visit_types, then visit_types is added
The disk has been sent to the manufacturer this morning, we now have to wait to hear from him.
Feb 11 2021
After puppet has added the group to the user nagios[1], the icinga services needed to be restarted.
# clush -b -w @staging 'systemctl restart icinga2' # clush -b -w @all 'systemctl restart icinga2'
I'm not sure to understand the real problem here.
As the indexer and indexer-storage are in same source repository, the versions should match or increase in //. Sentry should be able to deal with it as any other version upgrade.
T3041 needs to be done before this one (for the production environment)
D5063 is applied, the main webapp is now using swh-search by default.
This is (or should be ;) ) the state after the diff is applied :
The main webapp search can be switch from the sql search to the swh-search as all the tests performed on staging and https://webapp1.internal.softwareheritage.org are ok
Feb 10 2021
LGTM but it's biased as we worked onit together ;)
Feb 9 2021
LGTM
Feb 8 2021
The file /var/lib/puppet/state/agent_disabled.lock can be checked to detect if puppet is disable or not.
Precision around these disks replacement, even if the disks are in error, there is still spares on the zfs pool:
- db1 :
root@db1:~# zpool status data pool: data state: ONLINE scan: scrub repaired 0B in 0 days 00:16:01 with 0 errors on Sun Jan 10 00:40:03 2021 config:
Feb 5 2021
I start to throw some ideas in this document : https://hedgedoc.softwareheritage.org/Fi2pq7zkSw6aVAJwk9Xhqw
Nice, thanks for confirming this at the source.
It seems there were some huge queries the last few days [1], the script needed to be adapted to use Long instead of Integers :
apache_logs-2021.01.14: { "error" : { "root_cause" : [ { "type" : "script_exception", "reason" : "runtime error", "script_stack" : [ "java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:68)", "java.base/java.lang.Integer.parseInt(Integer.java:652)", "java.base/java.lang.Integer.parseInt(Integer.java:770)", "ctx._source.bytes = ctx._source.bytes instanceof java.lang.String ? Integer.parseInt(ctx._source.bytes) : ctx._source.bytes; ", " ^---- HERE" ], "script" : "ctx._source.bytes = ctx._source.bytes instanceof java.lang.String ? Integer.parseInt(ctx._source.bytes) : ctx._source.bytes; ctx._source.response = ctx._source.response instanceof java.lang.String ? Integer.parseInt(ctx._source.response) : ctx._source.response;", "lang" : "painless", "position" : { "offset" : 96, "start" : 0, "end" : 125 } } ], "type" : "script_exception", "reason" : "runtime error", "script_stack" : [ "java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:68)", "java.base/java.lang.Integer.parseInt(Integer.java:652)", "java.base/java.lang.Integer.parseInt(Integer.java:770)", "ctx._source.bytes = ctx._source.bytes instanceof java.lang.String ? Integer.parseInt(ctx._source.bytes) : ctx._source.bytes; ", " ^---- HERE" ], "script" : "ctx._source.bytes = ctx._source.bytes instanceof java.lang.String ? Integer.parseInt(ctx._source.bytes) : ctx._source.bytes; ctx._source.response = ctx._source.response instanceof java.lang.String ? Integer.parseInt(ctx._source.response) : ctx._source.response;", "lang" : "painless", "position" : { "offset" : 96, "start" : 0, "end" : 125 }, "caused_by" : { "type" : "number_format_exception", "reason" : "For input string: \"4633815064\"" } }, "status" : 400 }
Feb 4 2021
The opened apache indexes are currently being migrated with the P940's script.
\o/ thanks,
We should be able to properly format the common file now
The log parsing is ok.
An elasticsearch datasource was created on grafana so we can now create some graphs based on the logs on elasticsearch.
A simple dashboard to display some statistics based on the apache log was initiated[1], it appears the design is not as simple as in kibana and have some limitations but it still allows to have basic information centralized in grafana.
The question is not an abstract one: there are implementations of HyperLogLog that are monotonic, maybe the Redis one is already, we just need to know.
So far so good, the smart test is done and didn't find any errors :
SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 9 -
Feb 2 2021
configure filebeat to check the right file based on the vhost
add some additional fields to help the message filtering