Page MenuHomeSoftware Heritage

exclude temporary schemas from the statistics
ClosedPublic

Authored by vsellier on Dec 1 2020, 9:21 AM.

Details

Summary

The requests gathering performance statistics of postgres are getting
slow when the number of opened connection increase.
This is due to the cache of temporary tables cleaned only when the
connections are closed. Filtering on the public schema exclude the
temporary statistics.

When there is a lot of connections, one request one pg_statio_user_tables
can take ~40s and return ~1000 lines:

softwareheritage=#           SELECT
          current_database()::text AS datname,
          COALESCE(schemaname::text, 'null') AS schemaname,
          COALESCE(relname::text, 'null') AS relname,
          COALESCE(heap_blks_read::float, 0) AS heap_blks_read,
          COALESCE(heap_blks_hit::float, 0) AS heap_blks_hit,
          COALESCE(idx_blks_read::float, 0) AS idx_blks_read,
          COALESCE(idx_blks_hit::float, 0) AS idx_blks_hit
          FROM pg_statio_user_tables FULL JOIN (VALUES(0)) filler(i) ON TRUE;
SELECT
          current_database()::text AS datname,
          COALESCE(schemaname::text, 'null') AS schemaname,
          COALESCE(relname::text, 'null') AS relname,
          COALESCE(heap_blks_read::float, 0) AS heap_blks_read,
          COALESCE(heap_blks_hit::float, 0) AS heap_blks_hit,
          COALESCE(idx_blks_read::float, 0) AS idx_blks_read,
          COALESCE(idx_blks_hit::float, 0) AS idx_blks_hit
          FROM pg_statio_user_tables FULL JOIN (VALUES(0)) filler(i) ON TRUE;
...
 1561783992 | pg_temp_76  | tmp_content                 |              1 |             1 |               |              |               0 |              0 |              0 |             0
 1561784918 | pg_temp_187 | tmp_content                 |             23 |            23 |               |              |               0 |              0 |              0 |             0
 1561786123 | pg_temp_187 | tmp_directory               |              1 |             8 |               |              |               0 |              0 |              0 |             0
 1561786130 | pg_temp_187 | tmp_directory_entry_file    |              1 |             2 |               |              |               0 |              0 |              0 |             0
 1561786137 | pg_temp_187 | tmp_directory_entry_dir     |              1 |             2 |               |              |               0 |              0 |              0 |             0
 1561786144 | pg_temp_187 | tmp_directory_entry_rev     |              0 |             0 |               |              |               0 |              0 |              0 |             0
 1561786153 | pg_temp_187 | tmp_revision                |              2 |             6 |               |              |               2 |              4 |              4 |            12
 1898621340 | public      | metadata_authority          |           1497 |      97622321 |          2185 |    122252751 |               0 |              0 |              0 |             0
 1898621347 | public      | origin_metadata             |           3256 |            33 |             2 |            4 |               0 |              0 |              0 |             0
 1898621354 | public      | origin_metadata_translation |              0 |             0 |             0 |            0 |               0 |              0 |              0 |             0
 2119436598 | public      | metadata_fetcher            |            255 |      97623588 |          1660 |     24591505 |               0 |              0 |              0 |             0
...
(954 rows)
Time: 39803.734 ms (00:39.804)

It can also reduce the number of timeseries creation in prometheus.

Related to T2828

Test Plan
softwareheritage=#           SELECT
          current_database()::text AS datname,
          COALESCE(schemaname::text, 'null') AS schemaname,
          COALESCE(relname::text, 'null') AS relname,
          COALESCE(heap_blks_read::float, 0) AS heap_blks_read,
          COALESCE(heap_blks_hit::float, 0) AS heap_blks_hit,
          COALESCE(idx_blks_read::float, 0) AS idx_blks_read,
          COALESCE(idx_blks_hit::float, 0) AS idx_blks_hit
          FROM pg_statio_user_tables FULL JOIN (VALUES(0)) filler(i) ON TRUE;
SELECT
          current_database()::text AS datname,
          COALESCE(schemaname::text, 'null') AS schemaname,
          COALESCE(relname::text, 'null') AS relname,
          COALESCE(heap_blks_read::float, 0) AS heap_blks_read,
          COALESCE(heap_blks_hit::float, 0) AS heap_blks_hit,
          COALESCE(idx_blks_read::float, 0) AS idx_blks_read,
          COALESCE(idx_blks_hit::float, 0) AS idx_blks_hit
          FROM pg_statio_user_tables FULL JOIN (VALUES(0)) filler(i) ON TRUE;
...
(954 rows)
Time: 39803.734 ms (00:39.804)
softwareheritage-# ;
SELECT
          current_database()::text AS datname,
          COALESCE(schemaname::text, 'null') AS schemaname,
          COALESCE(relname::text, 'null') AS relname,
          COALESCE(heap_blks_read::float, 0) AS heap_blks_read,
          COALESCE(heap_blks_hit::float, 0) AS heap_blks_hit,
          COALESCE(idx_blks_read::float, 0) AS idx_blks_read,
          COALESCE(idx_blks_hit::float, 0) AS idx_blks_hit
          FROM pg_statio_user_tables FULL JOIN (VALUES(0)) filler(i) ON TRUE
          WHERE schemaname='public'
...
(25 rows)
Time: 235.463 ms

Diff Detail

Repository
rSPSITE puppet-swh-site
Branch
staging
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 17633
Build 27260: arc lint + arc unit

Event Timeline

ardumont added a subscriber: ardumont.

Thanks.

Let's see where this goes ;)

This revision is now accepted and ready to land.Dec 1 2020, 10:04 AM

This is the complete result of the statistics with the temporary tables : P886

olasd requested changes to this revision.Dec 1 2020, 10:07 AM
olasd added a subscriber: olasd.

I'd prefer this to filter *out* pg_temp schemas, rather than only filtering in the public schema. We don't use other schemas in our software heritage databases, but other software based on postgres could.

This revision now requires changes to proceed.Dec 1 2020, 10:07 AM

I was also somewhat concerned that this filtering would, in fact, show a partial picture of what's happening on the database, as we do a lot of processing / I/Os in temp tables. But in practice the aggregate value of i/os on temp tables is orders of magnitude lower than that of any actual table, so I guess this is fine...

Exclude only pg_temp schemas

In D4635#115756, @olasd wrote:

I was also somewhat concerned that this filtering would, in fact, show a partial picture of what's happening on the database, as we do a lot of processing / I/Os in temp tables. But in practice the aggregate value of i/os on temp tables is orders of magnitude lower than that of any actual table, so I guess this is fine...

sure, it makes sense. I choose to filter on public for performance concerns but the results are still returned in < 500ms which should be acceptable.

In D4635#115756, @olasd wrote:

I was also somewhat concerned that this filtering would, in fact, show a partial picture of what's happening on the database, as we do a lot of processing / I/Os in temp tables. But in practice the aggregate value of i/os on temp tables is orders of magnitude lower than that of any actual table, so I guess this is fine...

I agree, I'm not also very confortable with that but my feeling is it can be a compromise between accuracy and performance.

I was wondering if it could be possible to group the data of the temporary tables but we could have a counter that show decreasing counters as the connection are recycled...

This revision is now accepted and ready to land.Dec 1 2020, 11:14 AM