Page MenuHomeSoftware Heritage

postgresql: Increase some timeouts to get origin visits
ClosedPublic

Authored by anlambert on Jul 13 2022, 4:57 PM.

Details

Reviewers
vlorentz
Group Reviewers
Reviewers
Maniphest Tasks
Restricted Maniphest Task
Commits
rDSTOfbe3803820a5: postgresql: Increase some timeouts to get origin visits
Summary

Even if missing index to speedup origin visit queries has
been added to replica database, the configured timeouts for
origin_visit_get_with_statuses and origin_visit_find_by_date
were still too low to avoid query timeouts in production.

After performing some tests locally, bumping them to 2000ms
makes the timeouts go away.

Related to T4386

Should fix SWH-STORAGE-1C0, SWH-STORAGE-1DM and SWH-STORAGE-1DJ.

Test Plan

I created a SSH tunnel to be able to talk to production database server
from my machine connected to the VPN.

(swh) ✔ ~/swh/swh-environment/swh-web [master ↑·1|✚ 1⚑ 143] 
16:26 $ ssh -L 8900:192.168.100.103:5433 somerset.internal.softwareheritage.org
Linux somerset 5.15.35-3-pve #1 SMP PVE 5.15.35-6 (Fri, 17 Jun 2022 13:42:35 +0200) x86_64

1 updates could not be installed automatically. For more details,
see /var/log/unattended-upgrades/unattended-upgrades.log

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Wed Jul 13 14:25:59 2022 from 192.168.101.8
anlambert@somerset:~$

Then I executed the storage RPC server locally with the following command
and configuration:

swh storage -C ~/.config/swh/storage.yml rpc-serve
storage.yml
storage:
  cls: local
  db: "host=localhost port=8900 dbname=softwareheritage user=guest"
  
  objstorage:
    cls: remote
    url: http://objstorage.internal.softwareheritage.org:5003 # does not work but fortunately not needed for my tests

I updated swh-web configuration to use my local storage RPC server:

web.yml
storage:
  cls: remote
  url: http://localhost:5002/

...

Finally, I tweaked the timeout value for the origin_visit_get_with_statuses
and origin_visit_find_by_date methods until finding one that does not
make queries timeout, 2000ms seems to be enough.

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D8124 (id=29351)

Rebasing onto cfc867999d...

Current branch diff-target is up to date.
Changes applied before test
commit fbe3803820a5c348861057e40e4cd3a5460843a3
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Jul 13 16:44:13 2022 +0200

    postgresql: Increase some timeouts to get origin visits
    
    Even if missing index to speedup origin visit queries has
    been added to replica database, the configured timeouts for
    origin_visit_get_with_statuses and origin_visit_find_by_date
    were still too low to avoid query timeouts in production.
    
    After performing some tests locally, bumping them to 2000ms
    makes the timeouts go away.
    
    Related to T4386

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1634/ for more details.

This revision is now accepted and ready to land.Jul 13 2022, 5:18 PM