Page MenuHomeSoftware Heritage

Use a temporary table to update scheduler metrics
ClosedPublic

Authored by olasd on Dec 9 2021, 2:58 PM.

Details

Summary

When using `insert into <...> select <...>`, PostgreSQL disables
parallel querying. Under some circumstances (in our large production
database), this makes updating the scheduler metrics take a (very) long
time.

Parallel querying is allowed for `create table <...> as select <...>`,
and doing so restores the small(er) runtimes for this query (15 minutes
instead of multiple hours). To use that, we have to turn the function
into plpgsql instead of plain sql.

Related to T3785

Test Plan

Poked at the function in production until the runtime came back to a
more sensible value. No existing tests are regressing.

Diff Detail

Repository
rDSCH Scheduling utilities
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 25539
Build 39925: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 39924: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D6812 (id=24686)

Rebasing onto a8edbdbb00...

First, rewinding head to replay your work on top of it...
Applying: Use a temporary table to update scheduler metrics
Changes applied before test
commit 1d6de138f281801ff1e81c95ffce895389fe6a4b
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Dec 9 14:54:09 2021 +0100

    Use a temporary table to update scheduler metrics
    
    When using ``insert into <...> select <...>``, PostgreSQL disables
    parallel querying. Under some circumstances (in our large production
    database), this makes updating the scheduler metrics take a (very) long
    time.
    
    Parallel querying is allowed for ``create table <...> as select <...>``,
    and doing so restores the small(er) runtimes for this query (15 minutes
    instead of multiple hours). To use that, we have to turn the function
    into plpgsql instead of plain sql.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/503/ for more details.

olasd requested review of this revision.Dec 9 2021, 3:01 PM
ardumont added a subscriber: ardumont.

awesome, thanks.

This revision is now accepted and ready to land.Dec 9 2021, 3:10 PM

Build is green

Patch application report for D6812 (id=24687)

Rebasing onto a8edbdbb00...

Current branch diff-target is up to date.
Changes applied before test
commit e051b320e4050bdc75502bf23de8b5d53d368809
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Dec 9 14:54:09 2021 +0100

    Use a temporary table to update scheduler metrics
    
    When using ``insert into <...> select <...>``, PostgreSQL disables
    parallel querying. Under some circumstances (in our large production
    database), this makes updating the scheduler metrics take a (very) long
    time.
    
    Parallel querying is allowed for ``create table <...> as select <...>``,
    and doing so restores the small(er) runtimes for this query (15 minutes
    instead of multiple hours). To use that, we have to turn the function
    into plpgsql instead of plain sql.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/504/ for more details.