Page MenuHomeSoftware Heritage
Feed Advanced Search

Feb 15 2021

vsellier renamed T3041: [production] Provision enough space for the search ES cluster to ingest all intrinsic metadata from Provision enough space for the search ES cluster to ingest all intrinsic metadata to [production] Provision enough space for the search ES cluster to ingest all intrinsic metadata.
Feb 15 2021, 10:02 AM · System administration, Archive search
vsellier renamed T3040: [production] Enable swh-search's journal-client for indexed objects from Enable swh-search's journal-client for indexed objects to [production] Enable swh-search's journal-client for indexed objects.
Feb 15 2021, 10:02 AM · System administration, Journal, Archive search

Feb 12 2021

vsellier added a comment to T3042: swh-search: add statsd/prometheus metrics.

A basic dashboard [1] is created on garfana based on the number of log line.
It's too limited as it's not possible to isolate the logs per environment as the information is not available.
It will be added in T3043

Feb 12 2021, 5:53 PM · System administration, Archive search
vsellier moved T3042: swh-search: add statsd/prometheus metrics from Backlog to Weekly backlog on the System administration board.
Feb 12 2021, 5:44 PM · System administration, Archive search
vsellier moved T3043: journalbeat:/filebeat Add an environment field on the logs from Backlog to Weekly backlog on the System administration board.
Feb 12 2021, 5:44 PM · System administration
vsellier triaged T3043: journalbeat:/filebeat Add an environment field on the logs as Normal priority.
Feb 12 2021, 5:44 PM · System administration
vsellier updated the diff for D5052: loader: mark visit as 'not_found' when relevant.

Fix tests to increase code coverage

Feb 12 2021, 4:35 PM
vsellier updated the diff for D5052: loader: mark visit as 'not_found' when relevant.

The not_found status is now only set when the repository is really not found.

Feb 12 2021, 4:17 PM
vsellier triaged T3042: swh-search: add statsd/prometheus metrics as Normal priority.
Feb 12 2021, 12:19 PM · System administration, Archive search
vsellier added a comment to D5064: Enable to filter searched origins by visit types.

It also misses a test where an origin is initially inserted with visit_types, then visit_types is added

Feb 12 2021, 11:43 AM
vsellier added a comment to T3033: Replace first disk on storage1.staging.

The disk has been sent to the manufacturer this morning, we now have to wait to hear from him.

Feb 12 2021, 10:20 AM · System administration
vsellier accepted D5066: Simplify max_date().

👍nice! Thanks

Feb 12 2021, 10:18 AM

Feb 11 2021

vsellier closed D5049: loader: Mark visit status as failed when relevant.
Feb 11 2021, 6:21 PM
vsellier committed rDLDHG1f96493ef318: loader: Mark visit status as failed when relevant (authored by vsellier).
loader: Mark visit status as failed when relevant
Feb 11 2021, 6:21 PM
vsellier closed T2566: Add an icinga check on whether the puppet agent is enabled as Resolved.

After puppet has added the group to the user nagios[1], the icinga services needed to be restarted.

# clush -b -w @staging 'systemctl restart icinga2'
# clush -b -w @all 'systemctl restart icinga2'
Feb 11 2021, 4:55 PM · System administration
vsellier committed rSPSITE08c1f568a165: icinga: convert puppet checks threshold to seconds (authored by vsellier).
icinga: convert puppet checks threshold to seconds
Feb 11 2021, 3:55 PM
vsellier closed D5065: icinga: grant access to private puppet directories.
Feb 11 2021, 3:35 PM
vsellier committed rSPSITE8bc7ef05cca4: icinga: grant access to private puppet directories (authored by vsellier).
icinga: grant access to private puppet directories
Feb 11 2021, 3:35 PM
vsellier claimed T3015: Sentry should have two different projects for swh-indexer and swh-indexer-storage.
Feb 11 2021, 3:34 PM · System administration, Sentry
vsellier placed T3041: [production] Provision enough space for the search ES cluster to ingest all intrinsic metadata up for grabs.
Feb 11 2021, 3:33 PM · System administration, Archive search
vsellier placed T3040: [production] Enable swh-search's journal-client for indexed objects up for grabs.
Feb 11 2021, 3:32 PM · System administration, Journal, Archive search
vsellier added a comment to T3015: Sentry should have two different projects for swh-indexer and swh-indexer-storage.

I'm not sure to understand the real problem here.
As the indexer and indexer-storage are in same source repository, the versions should match or increase in //. Sentry should be able to deal with it as any other version upgrade.

Feb 11 2021, 3:23 PM · System administration, Sentry
vsellier changed the status of T3015: Sentry should have two different projects for swh-indexer and swh-indexer-storage from Open to Work in Progress.
Feb 11 2021, 3:05 PM · System administration, Sentry
vsellier updated the test plan for D5065: icinga: grant access to private puppet directories.
Feb 11 2021, 3:04 PM
vsellier requested review of D5065: icinga: grant access to private puppet directories.
Feb 11 2021, 3:03 PM
vsellier added a revision to T2566: Add an icinga check on whether the puppet agent is enabled: D5065: icinga: grant access to private puppet directories.
Feb 11 2021, 3:03 PM · System administration
vsellier added a comment to T3040: [production] Enable swh-search's journal-client for indexed objects.

T3041 needs to be done before this one (for the production environment)

Feb 11 2021, 2:21 PM · System administration, Journal, Archive search
vsellier added a project to T2182: Switch production swh-web to use swh-search instead of postgresql search.: System administration.
Feb 11 2021, 12:14 PM · System administration, Archive search, Storage manager
vsellier requested review of D5052: loader: mark visit as 'not_found' when relevant.
Feb 11 2021, 12:13 PM
vsellier closed T2182: Switch production swh-web to use swh-search instead of postgresql search., a subtask of T1910: Redesign origin search using a dedicated component (swh-search), as Resolved.
Feb 11 2021, 12:10 PM · Archive search, Storage manager
vsellier closed T2182: Switch production swh-web to use swh-search instead of postgresql search. as Resolved.

D5063 is applied, the main webapp is now using swh-search by default.

Feb 11 2021, 12:10 PM · System administration, Archive search, Storage manager
vsellier closed D5063: webapp: use swh-search as main search engine in production.
Feb 11 2021, 12:00 PM
vsellier committed rSPSITE6f18a7797f1c: webapp: use swh-search as main search engine (authored by vsellier).
webapp: use swh-search as main search engine
Feb 11 2021, 12:00 PM
vsellier requested review of D5049: loader: Mark visit status as failed when relevant.
Feb 11 2021, 11:55 AM
vsellier updated the test plan for D5063: webapp: use swh-search as main search engine in production.
Feb 11 2021, 11:50 AM
vsellier created P946 production deployment of swh-search.
Feb 11 2021, 11:29 AM
vsellier added a comment to D5063: webapp: use swh-search as main search engine in production.

This is (or should be ;) ) the state after the diff is applied :

Feb 11 2021, 11:28 AM
vsellier requested review of D5063: webapp: use swh-search as main search engine in production.
Feb 11 2021, 11:27 AM
vsellier added a revision to T2182: Switch production swh-web to use swh-search instead of postgresql search.: D5063: webapp: use swh-search as main search engine in production.
Feb 11 2021, 11:27 AM · System administration, Archive search, Storage manager
vsellier changed the status of T2182: Switch production swh-web to use swh-search instead of postgresql search., a subtask of T1910: Redesign origin search using a dedicated component (swh-search), from Open to Work in Progress.
Feb 11 2021, 9:24 AM · Archive search, Storage manager
vsellier changed the status of T2182: Switch production swh-web to use swh-search instead of postgresql search. from Open to Work in Progress.

The main webapp search can be switch from the sql search to the swh-search as all the tests performed on staging and https://webapp1.internal.softwareheritage.org are ok

Feb 11 2021, 9:24 AM · System administration, Archive search, Storage manager

Feb 10 2021

vsellier accepted D5060: Install properly backfill configuration to getty and storage1.staging.
Feb 10 2021, 6:07 PM
vsellier accepted D5058: backfill: Dedicate storage configuration for the backfiller.
Feb 10 2021, 6:03 PM
vsellier accepted D5059: Move indexer_journal_client to scheduler instances.
Feb 10 2021, 5:30 PM
vsellier added a comment to D5059: Move indexer_journal_client to scheduler instances.

LGTM but it's biased as we worked onit together ;)

Feb 10 2021, 5:30 PM
vsellier accepted D5056: staging: Dedicate an indexer worker.
Feb 10 2021, 12:36 PM
vsellier accepted D5054: Enable the journal-writer for the swh-idx-storage in production.
Feb 10 2021, 12:06 PM
vsellier closed D5047: journal_client: Fix date computations for (un)eventful visits.
Feb 10 2021, 11:17 AM
vsellier committed rDSCHcf32e3764920: journal_client: Fix date computations for (un)eventful visits (authored by vsellier).
journal_client: Fix date computations for (un)eventful visits
Feb 10 2021, 11:17 AM
vsellier edited P945 origin_visit_stats scenario.
Feb 10 2021, 10:11 AM
vsellier created P945 origin_visit_stats scenario.
Feb 10 2021, 10:00 AM

Feb 9 2021

vsellier accepted D5053: staging: Activate swh-search-journal-client@indexed.

LGTM

Feb 9 2021, 6:44 PM
vsellier updated the title for P944 search_journal_client @origin offsets matching the index backup (csv) from search_journal_client @origin offsets matching the index backup to search_journal_client @origin offsets matching the index backup (csv).
Feb 9 2021, 6:34 PM
vsellier created P944 search_journal_client @origin offsets matching the index backup (csv).
Feb 9 2021, 6:33 PM
vsellier requested review of D5047: journal_client: Fix date computations for (un)eventful visits.
Feb 9 2021, 6:19 PM
vsellier added a revision to T3030: Improve loaders to deal with new visit status events: D5052: loader: mark visit as 'not_found' when relevant.
Feb 9 2021, 5:09 PM · Core Loader
vsellier committed rSPSITEfb01f43ad2cd: icinga: use the right parameter name for ignoremissing (authored by vsellier).
icinga: use the right parameter name for ignoremissing
Feb 9 2021, 4:43 PM
vsellier closed D5043: icinga: monitor puppet agent activation.
Feb 9 2021, 4:26 PM
vsellier committed rSPSITEe3c3a3dc844a: icinga: monitor puppet agent activation (authored by vsellier).
icinga: monitor puppet agent activation
Feb 9 2021, 4:26 PM
vsellier added a revision to T3030: Improve loaders to deal with new visit status events: D5049: loader: Mark visit status as failed when relevant.
Feb 9 2021, 3:57 PM · Core Loader
vsellier added a revision to T3000: Investigate scheduler journal client discrepancies: D5047: journal_client: Fix date computations for (un)eventful visits.
Feb 9 2021, 3:04 PM · Scheduling utilities
vsellier updated the test plan for D5043: icinga: monitor puppet agent activation.
Feb 9 2021, 9:40 AM
vsellier added a reviewer for D5043: icinga: monitor puppet agent activation: System administrators.
Feb 9 2021, 9:39 AM
vsellier updated subscribers of D5043: icinga: monitor puppet agent activation.
Feb 9 2021, 9:39 AM
vsellier added a revision to T2566: Add an icinga check on whether the puppet agent is enabled: D5043: icinga: monitor puppet agent activation.
Feb 9 2021, 9:37 AM · System administration
vsellier requested review of D5043: icinga: monitor puppet agent activation.
Feb 9 2021, 9:37 AM
vsellier claimed T2566: Add an icinga check on whether the puppet agent is enabled.
Feb 9 2021, 9:26 AM · System administration

Feb 8 2021

vsellier added a comment to T2566: Add an icinga check on whether the puppet agent is enabled.

The file /var/lib/puppet/state/agent_disabled.lock can be checked to detect if puppet is disable or not.

Feb 8 2021, 3:15 PM · System administration
vsellier changed the status of T2566: Add an icinga check on whether the puppet agent is enabled from Open to Work in Progress.
Feb 8 2021, 2:20 PM · System administration
vsellier added a comment to T2939: Replace out of order disks on db1.staging and storage1.staging.

Precision around these disks replacement, even if the disks are in error, there is still spares on the zfs pool:

  • db1 :
root@db1:~# zpool status data
  pool: data
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:16:01 with 0 errors on Sun Jan 10 00:40:03 2021
config:
Feb 8 2021, 12:57 PM · System administration
vsellier moved T3009: Manage backfiller configuration in puppet from Backlog to Weekly backlog on the System administration board.
Feb 8 2021, 12:51 PM · System administration
vsellier moved T3015: Sentry should have two different projects for swh-indexer and swh-indexer-storage from Backlog to Weekly backlog on the System administration board.
Feb 8 2021, 12:50 PM · System administration, Sentry
vsellier moved T2566: Add an icinga check on whether the puppet agent is enabled from Backlog to Weekly backlog on the System administration board.
Feb 8 2021, 12:50 PM · System administration
vsellier moved T2960: Add disk health monitoring from Backlog to Weekly backlog on the System administration board.
Feb 8 2021, 12:50 PM · System administration
vsellier changed the status of T3033: Replace first disk on storage1.staging, a subtask of T2939: Replace out of order disks on db1.staging and storage1.staging, from Open to Work in Progress.
Feb 8 2021, 12:50 PM · System administration
vsellier changed the status of T3033: Replace first disk on storage1.staging from Open to Work in Progress.
Feb 8 2021, 12:50 PM · System administration
vsellier triaged T3033: Replace first disk on storage1.staging as Normal priority.
Feb 8 2021, 12:49 PM · System administration

Feb 5 2021

vsellier added a comment to T2231: Continuous deployment.

I start to throw some ideas in this document : https://hedgedoc.softwareheritage.org/Fi2pq7zkSw6aVAJwk9Xhqw

Feb 5 2021, 5:48 PM · meta-task, Roadmap 2022, Staging environment, Roadmap 2020
vsellier updated the task description for T3030: Improve loaders to deal with new visit status events.
Feb 5 2021, 2:30 PM · Core Loader
vsellier updated the summary of D5024: package: Mark visit status as failed when relevant.
Feb 5 2021, 2:29 PM
vsellier added a comment to T2912: Next generation archive counters.

Nice, thanks for confirming this at the source.

Feb 5 2021, 10:03 AM · Roadmap 2021, System administration, Monitoring, Web app
vsellier committed rDSNIP931603f05078: add a script to convert es indexes fields (authored by vsellier).
add a script to convert es indexes fields
Feb 5 2021, 9:21 AM
vsellier committed rDSNIP7718a0ca86ea: Support unreadable messages (authored by vsellier).
Support unreadable messages
Feb 5 2021, 9:21 AM
vsellier edited P940 Elasticsearch field conversion from string to integer.
Feb 5 2021, 9:10 AM · System administration
vsellier closed T2787: Improve access_logs parsing as Resolved.

It seems there were some huge queries the last few days [1], the script needed to be adapted to use Long instead of Integers :

apache_logs-2021.01.14:
{
  "error" : {
    "root_cause" : [
      {
        "type" : "script_exception",
        "reason" : "runtime error",
        "script_stack" : [
          "java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:68)",
          "java.base/java.lang.Integer.parseInt(Integer.java:652)",
          "java.base/java.lang.Integer.parseInt(Integer.java:770)",
          "ctx._source.bytes = ctx._source.bytes instanceof java.lang.String ? Integer.parseInt(ctx._source.bytes) : ctx._source.bytes; ",
          "                                                                                                ^---- HERE"
        ],
        "script" : "ctx._source.bytes = ctx._source.bytes instanceof java.lang.String ? Integer.parseInt(ctx._source.bytes) : ctx._source.bytes; ctx._source.response = ctx._source.response instanceof java.lang.String ? Integer.parseInt(ctx._source.response) : ctx._source.response;",
        "lang" : "painless",
        "position" : {
          "offset" : 96,
          "start" : 0,
          "end" : 125
        }
      }
    ],
    "type" : "script_exception",
    "reason" : "runtime error",
    "script_stack" : [
      "java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:68)",
      "java.base/java.lang.Integer.parseInt(Integer.java:652)",
      "java.base/java.lang.Integer.parseInt(Integer.java:770)",
      "ctx._source.bytes = ctx._source.bytes instanceof java.lang.String ? Integer.parseInt(ctx._source.bytes) : ctx._source.bytes; ",
      "                                                                                                ^---- HERE"
    ],
    "script" : "ctx._source.bytes = ctx._source.bytes instanceof java.lang.String ? Integer.parseInt(ctx._source.bytes) : ctx._source.bytes; ctx._source.response = ctx._source.response instanceof java.lang.String ? Integer.parseInt(ctx._source.response) : ctx._source.response;",
    "lang" : "painless",
    "position" : {
      "offset" : 96,
      "start" : 0,
      "end" : 125
    },
    "caused_by" : {
      "type" : "number_format_exception",
      "reason" : "For input string: \"4633815064\""
    }
  },
  "status" : 400
}
Feb 5 2021, 9:09 AM · System administration, Metrics/monitoring

Feb 4 2021

vsellier added a comment to T2787: Improve access_logs parsing.

The opened apache indexes are currently being migrated with the P940's script.

Feb 4 2021, 8:12 PM · System administration, Metrics/monitoring
vsellier created P940 Elasticsearch field conversion from string to integer.
Feb 4 2021, 7:46 PM · System administration
vsellier accepted D5007: Update decomissioning script with necessary instruction.
Feb 4 2021, 11:52 AM
vsellier accepted D5005: Decomission storage02.euwest.
Feb 4 2021, 11:48 AM
vsellier accepted D5006: hiera: Move cassandra configuration to its own yaml config file.

\o/ thanks,
We should be able to properly format the common file now

Feb 4 2021, 11:46 AM
vsellier committed rCDFPd15519eb8450: Add a current status (authored by vsellier).
Add a current status
Feb 4 2021, 11:27 AM
vsellier committed rCDFP5e116e91517d: WIP - POC kubernetes (authored by vsellier).
WIP - POC kubernetes
Feb 4 2021, 11:23 AM
vsellier added a comment to T2787: Improve access_logs parsing.

The log parsing is ok.
An elasticsearch datasource was created on grafana so we can now create some graphs based on the logs on elasticsearch.
A simple dashboard to display some statistics based on the apache log was initiated[1], it appears the design is not as simple as in kibana and have some limitations but it still allows to have basic information centralized in grafana.

Feb 4 2021, 10:42 AM · System administration, Metrics/monitoring
vsellier added a comment to T2912: Next generation archive counters.

The question is not an abstract one: there are implementations of HyperLogLog that are monotonic, maybe the Redis one is already, we just need to know.

Feb 4 2021, 9:48 AM · Roadmap 2021, System administration, Monitoring, Web app
vsellier closed T2975: Disk replacement on esnode1 as Resolved.

So far so good, the smart test is done and didn't find any errors :

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%         9         -
Feb 4 2021, 9:15 AM · System administration

Feb 2 2021

vsellier committed rSPSITEf59776a25cd5: logstash: clearly identify the applications (authored by vsellier).
logstash: clearly identify the applications
Feb 2 2021, 8:21 PM
vsellier committed rSPSITEb84027e16513: logstash: Allow to perform numerical operations on bytes (authored by vsellier).
logstash: Allow to perform numerical operations on bytes
Feb 2 2021, 8:21 PM
vsellier closed D5000: deposit: add request duration on access logs.
Feb 2 2021, 8:21 PM
vsellier committed rSPSITE3c7194e85922: deposit: add request duration on access logs (authored by vsellier).
deposit: add request duration on access logs
Feb 2 2021, 8:21 PM
vsellier updated the diff for D5000: deposit: add request duration on access logs.

configure filebeat to check the right file based on the vhost
add some additional fields to help the message filtering

Feb 2 2021, 8:20 PM