Page MenuHomeSoftware Heritage

Use a temporary table to update scheduler metrics
ClosedPublic

Authored by olasd on Dec 9 2021, 2:58 PM.

Details

Summary

When using `insert into <...> select <...>`, PostgreSQL disables
parallel querying. Under some circumstances (in our large production
database), this makes updating the scheduler metrics take a (very) long
time.

Parallel querying is allowed for `create table <...> as select <...>`,
and doing so restores the small(er) runtimes for this query (15 minutes
instead of multiple hours). To use that, we have to turn the function
into plpgsql instead of plain sql.

Related to T3785

Test Plan

Poked at the function in production until the runtime came back to a
more sensible value. No existing tests are regressing.

Diff Detail

Repository
rDSCH Scheduling utilities
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D6812 (id=24686)

Rebasing onto a8edbdbb00...

First, rewinding head to replay your work on top of it...
Applying: Use a temporary table to update scheduler metrics
Changes applied before test
commit 1d6de138f281801ff1e81c95ffce895389fe6a4b
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Dec 9 14:54:09 2021 +0100

    Use a temporary table to update scheduler metrics
    
    When using ``insert into <...> select <...>``, PostgreSQL disables
    parallel querying. Under some circumstances (in our large production
    database), this makes updating the scheduler metrics take a (very) long
    time.
    
    Parallel querying is allowed for ``create table <...> as select <...>``,
    and doing so restores the small(er) runtimes for this query (15 minutes
    instead of multiple hours). To use that, we have to turn the function
    into plpgsql instead of plain sql.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/503/ for more details.

olasd requested review of this revision.Dec 9 2021, 3:01 PM
ardumont added a subscriber: ardumont.

awesome, thanks.

This revision is now accepted and ready to land.Dec 9 2021, 3:10 PM

Build is green

Patch application report for D6812 (id=24687)

Rebasing onto a8edbdbb00...

Current branch diff-target is up to date.
Changes applied before test
commit e051b320e4050bdc75502bf23de8b5d53d368809
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Dec 9 14:54:09 2021 +0100

    Use a temporary table to update scheduler metrics
    
    When using ``insert into <...> select <...>``, PostgreSQL disables
    parallel querying. Under some circumstances (in our large production
    database), this makes updating the scheduler metrics take a (very) long
    time.
    
    Parallel querying is allowed for ``create table <...> as select <...>``,
    and doing so restores the small(er) runtimes for this query (15 minutes
    instead of multiple hours). To use that, we have to turn the function
    into plpgsql instead of plain sql.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/504/ for more details.