Page MenuHomeSoftware Heritage
Feed Advanced Search

Mar 12 2021

vsellier updated the diff for D5236: Add redis to the base build image.

use redis-server package instead of a metapackage

Mar 12 2021, 2:55 PM
vsellier added a comment to D5235: Use the recommanded way to declare the repository signing keys.

...
Looks simpler to me but there might be a reason to not use apt-key.

Mar 12 2021, 2:32 PM
vsellier updated the diff for D5236: Add redis to the base build image.

update commit message

Mar 12 2021, 12:50 PM
vsellier retitled D5236: Add redis to the base build image from Add redis on the base build image to Add redis to the base build image.
Mar 12 2021, 12:49 PM
vsellier requested review of D5236: Add redis to the base build image.
Mar 12 2021, 12:49 PM
vsellier added a revision to T2912: Next generation archive counters: D5236: Add redis to the base build image.
Mar 12 2021, 12:49 PM · Roadmap 2021, System administration, Monitoring, Web app
vsellier updated the summary of D5235: Use the recommanded way to declare the repository signing keys.
Mar 12 2021, 12:43 PM
vsellier added a revision to T3120: Fix swh-jenkins/base-buster docker image build: D5235: Use the recommanded way to declare the repository signing keys.
Mar 12 2021, 12:43 PM · System administration
vsellier updated the diff for D5235: Use the recommanded way to declare the repository signing keys.

Add task link on the commit message

Mar 12 2021, 12:42 PM
vsellier requested review of D5235: Use the recommanded way to declare the repository signing keys.
Mar 12 2021, 12:40 PM
vsellier triaged T3120: Fix swh-jenkins/base-buster docker image build as Normal priority.
Mar 12 2021, 11:09 AM · System administration
vsellier changed the status of T3120: Fix swh-jenkins/base-buster docker image build from Open to Work in Progress.
Mar 12 2021, 11:08 AM · System administration
vsellier added a revision to T2912: Next generation archive counters: D5232: Implement counters pipeline.
Mar 12 2021, 10:08 AM · Roadmap 2021, System administration, Monitoring, Web app
vsellier closed D5229: swh-counters: Implement the cli skeleton.
Mar 12 2021, 10:05 AM · Counters
vsellier committed rDCNT826c30e4b74c: Implement the cli skeleton (authored by vsellier).
Implement the cli skeleton
Mar 12 2021, 10:05 AM
vsellier retitled D5229: swh-counters: Implement the cli skeleton from Command line skeleton to swh-counters: Implement the cli skeleton.
Mar 12 2021, 9:48 AM · Counters
vsellier updated the diff for D5229: swh-counters: Implement the cli skeleton.

Update the commit message

Mar 12 2021, 9:47 AM · Counters
vsellier updated the diff for D5229: swh-counters: Implement the cli skeleton.

Fix review feedbacks

Mar 12 2021, 9:46 AM · Counters
vsellier added a comment to T3115: Upgrade zfs on all servers.
  • All workers and journal clients stopped before upgrading storage1 and db1
Mar 12 2021, 9:26 AM · System administration

Mar 11 2021

vsellier updated the task description for T3115: Upgrade zfs on all servers.
Mar 11 2021, 5:43 PM · System administration
vsellier added a comment to T3115: Upgrade zfs on all servers.

swh-search0

  • stopping writes
root@search0:~# systemctl stop swh-search-journal-client@objects
root@search0:~# systemctl stop swh-search-journal-client@indexed
root@search0:~# puppet agent --disable "zfs upgrade"
``
- package upgrades
- `swh-search0` rebooted
- `swh-search0` rebooted
- all service are up and running
Mar 11 2021, 5:43 PM · System administration
vsellier updated the task description for T3115: Upgrade zfs on all servers.
Mar 11 2021, 5:25 PM · System administration
vsellier moved T3115: Upgrade zfs on all servers from Backlog to in-progress on the System administration board.
Mar 11 2021, 5:23 PM · System administration
vsellier changed the status of T3115: Upgrade zfs on all servers from Open to Work in Progress.
Mar 11 2021, 5:22 PM · System administration
vsellier updated the test plan for D5229: swh-counters: Implement the cli skeleton.
Mar 11 2021, 2:39 PM · Counters
vsellier updated the test plan for D5229: swh-counters: Implement the cli skeleton.
Mar 11 2021, 2:39 PM · Counters
vsellier updated the summary of D5229: swh-counters: Implement the cli skeleton.
Mar 11 2021, 12:22 PM · Counters
vsellier retitled D5229: swh-counters: Implement the cli skeleton from wip - Command line skeleton to Command line skeleton.
Mar 11 2021, 12:20 PM · Counters
vsellier updated the diff for D5229: swh-counters: Implement the cli skeleton.

Add tests

Mar 11 2021, 12:09 PM · Counters

Mar 10 2021

vsellier planned changes to D5229: swh-counters: Implement the cli skeleton.
Mar 10 2021, 4:43 PM · Counters
vsellier requested review of D5229: swh-counters: Implement the cli skeleton.
Mar 10 2021, 4:43 PM · Counters
vsellier added a revision to T2912: Next generation archive counters: D5229: swh-counters: Implement the cli skeleton.
Mar 10 2021, 4:30 PM · Roadmap 2021, System administration, Monitoring, Web app
vsellier added a comment to T3086: Prepare disk replacement on granet.

Mail sent to the dsi to request the installation of 2 of the new disks

Mar 10 2021, 11:46 AM · System administration
vsellier added a comment to T3086: Prepare disk replacement on granet.

Overview of the system :

  • 2 slots availables (10 slot occupied on a total of 12)
  • system installed on 2 disks ssd disk (wwn-0x500a075122f366e4 and wwn-0x500a075122f357f1)
  • 2 zfs pools
NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
hdd   29.1T  22.8T  6.29T        -         -    20%    78%  1.00x    ONLINE  -
ssd   10.3T  7.91T  2.44T        -         -    24%    76%  1.00x    ONLINE  -
root@granet:~# zpool status -v hdd
  pool: hdd
 state: ONLINE
  scan: scrub repaired 0B in 0 days 15:42:24 with 0 errors on Sun Feb 14 16:06:26 2021
config:
Mar 10 2021, 11:08 AM · System administration

Mar 8 2021

vsellier committed rCDFPbb5dfba12da1: update the documentation (authored by vsellier).
update the documentation
Mar 8 2021, 8:28 PM
vsellier committed rCDFPbe00c68e7c19: add deployments for graph and content replayers (authored by vsellier).
add deployments for graph and content replayers
Mar 8 2021, 8:28 PM
vsellier committed rCDFP301c573d2597: Avoid foreign key violations when the replayer receive unordered messages (authored by vsellier).
Avoid foreign key violations when the replayer receive unordered messages
Mar 8 2021, 8:28 PM

Mar 5 2021

vsellier added a comment to T3041: [production] Provision enough space for the search ES cluster to ingest all intrinsic metadata.

Thanks for the feedback

Mar 5 2021, 6:18 PM · System administration, Archive search
vsellier updated the task description for T3041: [production] Provision enough space for the search ES cluster to ingest all intrinsic metadata.
Mar 5 2021, 5:26 PM · System administration, Archive search
vsellier added a comment to T2912: Next generation archive counters.
Mar 5 2021, 12:12 PM · Roadmap 2021, System administration, Monitoring, Web app
vsellier committed rCJSWH5e7ce5fde3da: jobs/swh-packages: Add swh-counters module (authored by vsellier).
jobs/swh-packages: Add swh-counters module
Mar 5 2021, 12:00 PM
vsellier committed rDENV00dd72c2e514: .mrconfig: Add swh-counters module (authored by vsellier).
.mrconfig: Add swh-counters module
Mar 5 2021, 11:51 AM
vsellier committed rDCNT142fff84305b: Update template defauts (authored by vsellier).
Update template defauts
Mar 5 2021, 11:44 AM
vsellier committed rDCNTae8cfbfe71c1: import template from swh-py-template (init-py-repo) (authored by vsellier).
import template from swh-py-template (init-py-repo)
Mar 5 2021, 11:32 AM
vsellier created Counters.
Mar 5 2021, 11:16 AM
vsellier changed the status of T2912: Next generation archive counters from Open to Work in Progress.

Let's start the subject ;)

Mar 5 2021, 11:07 AM · Roadmap 2021, System administration, Monitoring, Web app
vsellier renamed T3086: Prepare disk replacement on granet from Prepare disk replacement of granet to Prepare disk replacement on granet.
Mar 5 2021, 10:59 AM · System administration
vsellier added a comment to T3083: Deploy swh-search v0.7.0/v0.7.1.

I forgot one step, cleaning the previous alias origin -> origin_production not needed anymore:

vsellier@search-esnode1 ~ % curl -s http://$ES_SERVER/_cat/indices\?v && echo && curl -s http://$ES_SERVER/_cat/aliases\?v && echo && curl -s http://$ES_SERVER/_cat/health\?v  
health status index             uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin-production hZfuv0lVRImjOjO_rYgDzg  90   1  153130652     26701625    273.4gb        137.3gb
Mar 5 2021, 10:45 AM · System administration, Journal, Archive search
vsellier closed T3076: [swh-search] Improve the index/mapping migration process as Resolved.

The new configuration is deployed, swh-search is now using the alias which should help for the future upgrades

Mar 5 2021, 10:35 AM · System administration, Journal, Archive search
vsellier closed T3076: [swh-search] Improve the index/mapping migration process, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Mar 5 2021, 10:35 AM · Journal, Archive search
vsellier closed T3083: Deploy swh-search v0.7.0/v0.7.1, a subtask of T3076: [swh-search] Improve the index/mapping migration process, as Resolved.
Mar 5 2021, 10:34 AM · System administration, Journal, Archive search
vsellier closed T3083: Deploy swh-search v0.7.0/v0.7.1 as Resolved.
Mar 5 2021, 10:34 AM · System administration, Journal, Archive search
vsellier added a comment to T3083: Deploy swh-search v0.7.0/v0.7.1.

Deployment in production:

  • puppet stopped
  • configuration updated to declare the index, it needs to be done to make swh-search initializing the aliaes before the journal clients starts (not guaranteed with a puppet apply)
  • package updated
  • gunicorn-swh-search service restarted:
Mar 05 09:08:46 search1 python3[1881743]: 2021-03-05 09:08:46 [1881743] gunicorn.error:INFO Starting gunicorn 19.9.0
Mar 05 09:08:46 search1 python3[1881743]: 2021-03-05 09:08:46 [1881743] gunicorn.error:INFO Listening at: unix:/run/gunicorn/swh-search/gunicorn.sock (1881743)
Mar 05 09:08:46 search1 python3[1881743]: 2021-03-05 09:08:46 [1881743] gunicorn.error:INFO Using worker: sync
Mar 05 09:08:46 search1 python3[1881748]: 2021-03-05 09:08:46 [1881748] gunicorn.error:INFO Booting worker with pid: 1881748
Mar 05 09:08:46 search1 python3[1881749]: 2021-03-05 09:08:46 [1881749] gunicorn.error:INFO Booting worker with pid: 1881749
Mar 05 09:08:46 search1 python3[1881750]: 2021-03-05 09:08:46 [1881750] gunicorn.error:INFO Booting worker with pid: 1881750
Mar 05 09:08:46 search1 python3[1881751]: 2021-03-05 09:08:46 [1881751] gunicorn.error:INFO Booting worker with pid: 1881751
Mar 05 09:08:53 search1 python3[1881750]: 2021-03-05 09:08:53 [1881750] swh.search.api.server:INFO Initializing indexes with configuration: 
Mar 05 09:08:53 search1 python3[1881750]: 2021-03-05 09:08:53 [1881750] elasticsearch:INFO HEAD http://search-esnode2.internal.softwareheritage.org:9200/origin-production [status:200 request:0.023s]
Mar 05 09:08:54 search1 python3[1881750]: 2021-03-05 09:08:54 [1881750] elasticsearch:INFO PUT http://search-esnode1.internal.softwareheritage.org:9200/origin-production/_alias/origin-read [status:200 request:0.487s]
Mar 05 09:08:54 search1 python3[1881750]: 2021-03-05 09:08:54 [1881750] elasticsearch:INFO PUT http://search-esnode3.internal.softwareheritage.org:9200/origin-production/_alias/origin-write [status:200 request:0.152s]
Mar 05 09:08:54 search1 python3[1881750]: 2021-03-05 09:08:54 [1881750] elasticsearch:INFO PUT http://search-esnode1.internal.softwareheritage.org:9200/origin-production/_mapping [status:200 request:0.009s]
vsellier@search-esnode1 ~ % curl -s http://$ES_SERVER/_cat/indices\?v && echo && curl -s http://$ES_SERVER/_cat/aliases\?v && echo && curl -s http://$ES_SERVER/_cat/health\?v 
health status index             uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin-production hZfuv0lVRImjOjO_rYgDzg  90   1  153097672    144224208    288.1gb          149gb
Mar 5 2021, 10:34 AM · System administration, Journal, Archive search
vsellier closed D5198: swh-search: add indexes configuration.
Mar 5 2021, 9:36 AM
vsellier committed rSPSITEb4d30523f9bd: swh-search: add indexes configuration (authored by vsellier).
swh-search: add indexes configuration
Mar 5 2021, 9:36 AM
vsellier committed rCDFP70e50a9dd525: wip - prometheus/grafana/web (authored by vsellier).
wip - prometheus/grafana/web
Mar 5 2021, 9:23 AM
vsellier requested review of D5198: swh-search: add indexes configuration.
Mar 5 2021, 9:19 AM
vsellier added a revision to T3083: Deploy swh-search v0.7.0/v0.7.1: D5198: swh-search: add indexes configuration.
Mar 5 2021, 9:19 AM · System administration, Journal, Archive search
vsellier committed rSENV8b9658d2e128: vagrant: Add search1 production server (authored by vsellier).
vagrant: Add search1 production server
Mar 5 2021, 9:16 AM

Mar 4 2021

vsellier triaged T3086: Prepare disk replacement on granet as Normal priority.
Mar 4 2021, 6:18 PM · System administration
vsellier added a comment to T3083: Deploy swh-search v0.7.0/v0.7.1.

swh-search:v0.7.1 deployed in staging according to the defined plan.
The aliases are well created and used by the services

vsellier@search-esnode0 ~ % curl -XGET -H "Content-Type: application/json" http://192.168.130.80:9200/_cat/indices
green open  origin                      HthJj42xT5uO7w3Aoxzppw 80 0 929692 137147 4gb 4gb
green close origin-backup-20210209-1736 P1CKjXW0QiWM5zlzX46-fg 80 0                      
green close origin-v0.5.0               SGplSaqPR_O9cPYU4ZsmdQ 80 0                      
vsellier@search-esnode0 ~ % curl -XGET -H "Content-Type: application/json" http://192.168.130.80:9200/_cat/aliases
origin-read  origin - - - -
origin-write origin - - - -

Journal clients:

Mar 04 16:22:40 search0 swh[3598137]: INFO:elasticsearch:POST http://search-esnode0.internal.staging.swh.network:9200/origin-write/_bulk [status:200 request:0.013s]
Mar 04 16:22:41 search0 swh[3598137]: INFO:elasticsearch:POST http://search-esnode0.internal.staging.swh.network:9200/origin-write/_bulk [status:200 request:0.012s]

Search:

Mar 04 15:40:20 search0 python3[3598040]: 2021-03-04 15:40:20 [3598040] swh.search.api.server:INFO Initializing indexes with configuration: 
Mar 04 15:40:20 search0 python3[3598040]: 2021-03-04 15:40:20 [3598040] elasticsearch:INFO HEAD http://search-esnode0.internal.staging.swh.network:9200/origin [status:200 request:0.005s]
Mar 04 15:40:20 search0 python3[3598040]: 2021-03-04 15:40:20 [3598040] elasticsearch:INFO HEAD http://search-esnode0.internal.staging.swh.network:9200/origin-read/_alias [status:200 request:0.001s]
Mar 04 15:40:20 search0 python3[3598040]: 2021-03-04 15:40:20 [3598040] elasticsearch:INFO HEAD http://search-esnode0.internal.staging.swh.network:9200/origin-write/_alias [status:200 request:0.001s]
Mar 04 15:40:20 search0 python3[3598040]: 2021-03-04 15:40:20 [3598040] elasticsearch:INFO PUT http://search-esnode0.internal.staging.swh.network:9200/origin/_mapping [status:200 request:0.006s]
Mar 04 16:19:27 search0 python3[3598042]: 2021-03-04 16:19:27 [3598042] elasticsearch:INFO GET http://search-esnode0.internal.staging.swh.network:9200/origin-read/_search?size=100 [status:200 request:0.076s]
Mar 4 2021, 5:24 PM · System administration, Journal, Archive search
vsellier closed D5196: Allow to instantiate the service with default indexes configuration.
Mar 4 2021, 4:02 PM
vsellier committed rDSEA84801b3036c2: Allow to instantiate the service with default indexes configuration (authored by vsellier).
Allow to instantiate the service with default indexes configuration
Mar 4 2021, 4:01 PM
vsellier renamed T3083: Deploy swh-search v0.7.0/v0.7.1 from Deploy swh-search v0.7.0 to Deploy swh-search v0.7.0/v0.7.1.
Mar 4 2021, 4:01 PM · System administration, Journal, Archive search
vsellier updated the diff for D5196: Allow to instantiate the service with default indexes configuration.

Remove the tests because the flask application is not reinitialized
between 2 unit tests and testing the ElasticSearch class instanciation
with different configuration by flask is not working.

Mar 4 2021, 3:57 PM
vsellier added a revision to T3076: [swh-search] Improve the index/mapping migration process: D5196: Allow to instantiate the service with default indexes configuration.
Mar 4 2021, 3:21 PM · System administration, Journal, Archive search
vsellier added a comment to T3081: ZFS failures detected on belvedere.

The 4 files seems to be accessible without errors which look like a good news ;):

root@belvedere:~# time cp /srv/softwareheritage/postgres/11/indexer/base/16406/774467031.317 /dev/null
Mar 4 2021, 12:36 PM · System administration
vsellier changed the status of T3083: Deploy swh-search v0.7.0/v0.7.1 from Open to Work in Progress.
Mar 4 2021, 12:09 PM · System administration, Journal, Archive search
vsellier added a comment to T3054: Scale out object storage design.

I have found some interesting pointers relative to the management of small files in hdfs (found them when looking for unrelated other stuff). Is it something you have identified and excluded from the scope due to some blockers ?

Mar 4 2021, 11:59 AM · Roadmap 2022, Object storage (RedHat collaboration), Roadmap 2021, meta-task
vsellier added a comment to T3081: ZFS failures detected on belvedere.

In T2892, to solve the problem on the pk's files, the indexes were recreated. @olasd: Is it enough to solve the issue ?

Mar 4 2021, 10:34 AM · System administration
vsellier added a comment to T3081: ZFS failures detected on belvedere.

Isn't this around when we've restarted production after expanding the storage pool?

The loaders were restarted in late November, but perhaps more of them were launched at this moment

Mar 4 2021, 9:45 AM · System administration
vsellier updated the task description for T3081: ZFS failures detected on belvedere.
Mar 4 2021, 9:36 AM · System administration

Mar 3 2021

vsellier closed D5193: Ensure the elasticsearch indexes are initialized before the first request.
Mar 3 2021, 6:32 PM
vsellier committed rDSEA9e0db2bd4fd0: Ensure the elasticsearch indexes are initialized before the first request (authored by vsellier).
Ensure the elasticsearch indexes are initialized before the first request
Mar 3 2021, 6:32 PM
vsellier closed T3033: Replace first disk on storage1.staging as Resolved.
Mar 3 2021, 6:28 PM · System administration
vsellier closed T3033: Replace first disk on storage1.staging, a subtask of T2939: Replace out of order disks on db1.staging and storage1.staging, as Resolved.
Mar 3 2021, 6:28 PM · System administration
vsellier added a comment to T3033: Replace first disk on storage1.staging.

The disk was tested completely with read/write operations (interrupted on the 2d pass)

Mar 3 2021, 6:28 PM · System administration
vsellier updated the summary of D5193: Ensure the elasticsearch indexes are initialized before the first request.
Mar 3 2021, 6:10 PM
vsellier updated the diff for D5193: Ensure the elasticsearch indexes are initialized before the first request.
  • fix wrong error log level
  • fix typo on the commit message
Mar 3 2021, 6:10 PM
vsellier requested review of D5193: Ensure the elasticsearch indexes are initialized before the first request.
Mar 3 2021, 6:05 PM
vsellier added a revision to T3076: [swh-search] Improve the index/mapping migration process: D5193: Ensure the elasticsearch indexes are initialized before the first request.
Mar 3 2021, 6:03 PM · System administration, Journal, Archive search
vsellier closed D5179: Use elasticsearch aliases to simplify maintenance operations.
Mar 3 2021, 3:52 PM
vsellier committed rDSEA7c795a603f7a: Use elasticsearch aliases to simplify maintenance operations (authored by vsellier).
Use elasticsearch aliases to simplify maintenance operations
Mar 3 2021, 3:52 PM
vsellier added a comment to T3081: ZFS failures detected on belvedere.

For the record, it seems the 4 impacted files are related to the primary key of the softwareheritage-indexer.content_mimetype table

Mar 3 2021, 3:50 PM · System administration
vsellier added a comment to P966 Snapshot branches queries performances.

yep it's weird, but after looking at the code of the function, I realized it seems to be a known problem :
https://forge.softwareheritage.org/source/swh-storage/browse/master/swh/storage/sql/40-funcs.sql$665-667

Mar 3 2021, 3:40 PM
vsellier added a comment to P966 Snapshot branches queries performances.

limit 2:
https://explain.depesz.com/s/UW9Z

softwareheritage=> explain analyze with filtered_snapshot_branches as (
  select '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea as snapshot_id, name, target, target_type
  from snapshot_branches
  inner join snapshot_branch on snapshot_branches.branch_id = snapshot_branch.object_id
  where snapshot_id = (select object_id from snapshot where snapshot.id = '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea)
    and (NULL :: snapshot_target[] is null or target_type = any(NULL :: snapshot_target[]))
)
select snapshot_id, name, target, target_type
from filtered_snapshot_branches
where name >= '\x'::bytea
  and (NULL is null or convert_from(name, 'utf-8') ilike NULL)
  and (NULL is null or convert_from(name, 'utf-8') not ilike NULL)
order by name limit 2;
                                                                                                    QUERY PLAN                                                                                                     
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1004.01..6764.11 rows=2 width=76) (actual time=172523.081..173555.673 rows=2 loops=1)
   InitPlan 1 (returns $0)
     ->  Index Scan using snapshot_id_idx on snapshot  (cost=0.57..2.59 rows=1 width=8) (actual time=0.028..0.036 rows=1 loops=1)
           Index Cond: ((id)::bytea = '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea)
   ->  Gather Merge  (cost=1001.43..168852423.27 rows=58628 width=76) (actual time=172523.079..173555.661 rows=2 loops=1)
         Workers Planned: 2
         Params Evaluated: $0
         Workers Launched: 2
         ->  Nested Loop  (cost=1.40..168844656.12 rows=24428 width=76) (actual time=126442.320..167761.276 rows=2 loops=3)
               ->  Parallel Index Scan using snapshot_branch_name_target_target_type_idx on snapshot_branch  (cost=0.70..12612971.47 rows=154824599 width=52) (actual time=0.077..80926.811 rows=23123612 loops=3)
                     Index Cond: (name >= '\x'::bytea)
               ->  Index Only Scan using snapshot_branches_pkey on snapshot_branches  (cost=0.70..1.01 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=69370837)
                     Index Cond: ((snapshot_id = $0) AND (branch_id = snapshot_branch.object_id))
                     Heap Fetches: 5
 Planning Time: 0.993 ms
 Execution Time: 173555.864 ms
(16 rows)
Mar 3 2021, 3:18 PM
vsellier added a comment to P966 Snapshot branches queries performances.

It seems there are some differences in term of indexes between the main and replica databases.
On the replica, only the primary keys are present on the snapshot_branches and the snapshot_branch tables. Perhaps the query optimizer is confused by something and is doing a wrong choice somewhere.

Mar 3 2021, 3:08 PM
vsellier updated subscribers of T3081: ZFS failures detected on belvedere.

No problems are detected on the IDrac, smartcl on the drives looks ok.

Mar 3 2021, 12:59 PM · System administration
vsellier closed T3061: swh-search: Deploy visit_types indexation in production, a subtask of T2869: web search: allow to filter by origin type, as Resolved.
Mar 3 2021, 12:21 PM · Web app
vsellier closed T3061: swh-search: Deploy visit_types indexation in production as Resolved.

The lag has recovered so the index should contains the visit_type for all origin now

Mar 3 2021, 12:21 PM · System administration, Web app
vsellier added a comment to T3081: ZFS failures detected on belvedere.

(not related or directly retated to the issue) Looking at some potential issues on disk i/os,I discovered a weird behavior change on the i/o on belvedere after the 2020-12-31 :

Mar 3 2021, 11:56 AM · System administration
vsellier added a comment to T3081: ZFS failures detected on belvedere.

There is no errors on the postgresql logs related to the files listed on the zfs status but I'm not sure the indexer database is read.

Mar 3 2021, 11:31 AM · System administration
vsellier added a comment to T3081: ZFS failures detected on belvedere.

It seems there are some reccurring alerts on the system journal about several disks since some time :

Feb 24 01:33:36 belvedere.internal.softwareheritage.org kernel: sd 0:0:14:0: [sdi] tag#808 Sense Key : Recovered Error [current] [descriptor] 
Feb 24 01:33:36 belvedere.internal.softwareheritage.org kernel: sd 0:0:14:0: [sdi] tag#808 Add. Sense: Defect list not found
Feb 24 01:33:39 belvedere.internal.softwareheritage.org kernel: sd 0:0:16:0: [sdk] tag#650 Sense Key : Recovered Error [current] [descriptor] 
Feb 24 01:33:39 belvedere.internal.softwareheritage.org kernel: sd 0:0:16:0: [sdk] tag#650 Add. Sense: Defect list not found
Feb 24 01:33:41 belvedere.internal.softwareheritage.org kernel: sd 0:0:17:0: [sdl] tag#669 Sense Key : Recovered Error [current] [descriptor] 
Feb 24 01:33:41 belvedere.internal.softwareheritage.org kernel: sd 0:0:17:0: [sdl] tag#669 Add. Sense: Defect list not found
Feb 24 01:33:43 belvedere.internal.softwareheritage.org kernel: sd 0:0:18:0: [sdm] tag#682 Sense Key : Recovered Error [current] [descriptor] 
Feb 24 01:33:43 belvedere.internal.softwareheritage.org kernel: sd 0:0:18:0: [sdm] tag#682 Add. Sense: Defect list not found
Feb 24 01:33:44 belvedere.internal.softwareheritage.org kernel: sd 0:0:21:0: [sdo] tag#668 Sense Key : Recovered Error [current] [descriptor] 
Feb 24 01:33:44 belvedere.internal.softwareheritage.org kernel: sd 0:0:21:0: [sdo] tag#668 Add. Sense: Defect list not found
Feb 24 01:33:44 belvedere.internal.softwareheritage.org kernel: sd 0:0:22:0: [sdp] tag#682 Sense Key : Recovered Error [current] [descriptor] 
Feb 24 01:33:44 belvedere.internal.softwareheritage.org kernel: sd 0:0:22:0: [sdp] tag#682 Add. Sense: Defect list not found
...
root@belvedere:/var/log# journalctl -k --since=yesterday | awk '{print $8}' | sort | uniq -c
    274 [sdi]
    274 [sdk]
    274 [sdl]
    274 [sdm]
    274 [sdo]
    274 [sdp]
Mar 3 2021, 11:16 AM · System administration
vsellier updated the task description for T3081: ZFS failures detected on belvedere.
Mar 3 2021, 11:02 AM · System administration
vsellier renamed T3081: ZFS failures detected on belvedere from ZFS failure detected on belvedere to ZFS failures detected on belvedere.
Mar 3 2021, 11:02 AM · System administration
vsellier changed the status of T3081: ZFS failures detected on belvedere from Open to Work in Progress.
Mar 3 2021, 11:01 AM · System administration
vsellier created P965 Belvedere zfs errors.
Mar 3 2021, 11:00 AM
vsellier updated the diff for D5179: Use elasticsearch aliases to simplify maintenance operations.

Configure the indexes with a Dict with an entry per index type

Mar 3 2021, 10:30 AM

Mar 2 2021

vsellier accepted D5186: search.cli: Drop unused and untested rpc-serve cli entrypoint.

lgtm

Mar 2 2021, 6:44 PM
vsellier accepted D5185: api.wsgi: Drop unused wsgi module.

LGTM

Mar 2 2021, 6:33 PM