Page MenuHomeSoftware Heritage

staging: Increase the number of workers for storage and indexer storage
ClosedPublic

Authored by vsellier on Aug 31 2022, 6:19 PM.

Details

Summary

The default configuration is way too low (4 workers).
They respectively have around 32 and 16 concurrent parallel requests all the time

Related to T4477

A simple requests without a database access is very long:

vsellier@storage1 ~ % curl -w "@.curl-format.txt" -o /dev/null -s http://localhost:5002
time_namelookup: 0.000472
time_connect: 0.000620
time_appconnect: 0.000000
time_pretransfer: 0.000659
time_redirect: 0.000000
time_starttransfer: 4.734891
———
time_total: 4.734945

Which seems to indicates swh-storage is not responding

there are a lot of connection waiting a response (handled by nginx)

root@storage1:/var/log/nginx# ss -tn | grep 5002 | grep ESTAB | wc -l
33
root@storage1:/etc# ss -tn | grep 5007 | grep ESTAB | wc -l
17

and only 4 workers configured

Test Plan
diff origin/production/storage1.internal.staging.swh.network current/storage1.internal.staging.swh.network
*******************************************
  File[/etc/gunicorn/instances/swh-indexer-storage.cfg] =>
   parameters =>
     content =>
      @@ -41,5 +41,5 @@
       # custom settings
       bind = "unix:/run/gunicorn/swh-indexer-storage/gunicorn.sock"
      -workers = 4
      +workers = 24
       worker_class = "sync"
       timeout = 3600
*******************************************
  File[/etc/gunicorn/instances/swh-storage.cfg] =>
   parameters =>
     content =>
      @@ -41,11 +41,11 @@
       # custom settings
       bind = "unix:/run/gunicorn/swh-storage/gunicorn.sock"
      -workers = 4
      +workers = 40
       worker_class = "sync"
       timeout = 3600
       graceful_timeout = 3600
       keepalive = 5
      -max_requests = 100
      -max_requests_jitter = 10
      +max_requests = 5000
      +max_requests_jitter = 500
       statsd_host = "127.0.0.1:8125"
       statsd_prefix = "swh-storage"
*******************************************
  Gunicorn::Instance[swh-indexer-storage] =>
   parameters =>
     settings =>
       workers =>
        - 4
        + 24
*******************************************
  Gunicorn::Instance[swh-storage] =>
   parameters =>
     settings =>
       max_requests =>
        - 100
        + 5000
       max_requests_jitter =>
        - 10
        + 500
       workers =>
        - 4
        + 40
*******************************************
*** End octocatalog-diff on storage1.internal.staging.swh.network

Diff Detail

Repository
rSPSITE puppet-swh-site
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

vsellier created this revision.
This revision is now accepted and ready to land.Aug 31 2022, 6:22 PM

Increase the number of request handled by a storage worker before it's restarted