Page MenuHomeSoftware Heritage

Use a temporary table to add relations
ClosedPublic

Authored by olasd on Sep 7 2021, 3:12 PM.

Details

Summary

This avoids passing a (potentially quite large) list of rows as argument
to a function call.

Depends on D6090

Test Plan

all tox tests pass

Diff Detail

Repository
rDPROV Provenance database
Branch
diff/22237
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 23413
Build 36533: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 36532: arc lint + arc unit

Event Timeline

Build has FAILED

Patch application report for D6204 (id=22456)

Could not rebase; Attempt merge onto 3e009a2f77...

Updating 3e009a2..7b6d504
Fast-forward
 swh/provenance/postgresql/provenance.py | 150 ++-----
 swh/provenance/sql/30-schema.sql        |  63 ++-
 swh/provenance/sql/40-funcs.sql         | 683 +++++++++++++++++++++++++-------
 swh/provenance/tests/conftest.py        |   2 +-
 4 files changed, 610 insertions(+), 288 deletions(-)
Changes applied before test
commit 7b6d50424f8e13487d6835cf680a6e7df9ab3e01
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Tue Sep 7 15:10:50 2021 +0200

    Use a temporary table to add relations
    
    This avoids passing a (potentially quite large) list of rows as argument
    to a function call.

commit 9076e8a1dd3468dc400007824be4ca51c4f5c131
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Aug 19 14:40:48 2021 +0200

    Add SQL enum for relation get filter options

commit ec1827e6eb00e24ded1a043a48b5610949e5251c
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Aug 16 11:10:06 2021 +0200

    Improve denormalized PostgreSQL schema to properly check for duplicates

commit e529372cf218ff2cfb9b88950a6973e2fac6b6bf
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Aug 26 11:35:48 2021 +0200

    Move PosgreSQL's backend method `relation_add` to SQL function

commit b523dfd8c1f803e8dbe9d2414c1f3cd08b0cf2ce
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Aug 13 15:35:10 2021 +0200

    Refactor `relation_add` prepearing to move SQL queries into functions
    
    The four flavor's cases are made explicit to simplify the reading and the
    definition of future SQL functions.

commit acd76e442fbefb5b3055e06daf8a87afcb53ee48
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Aug 18 11:37:30 2021 +0200

    Move PosgreSQL's backend method `relation_get` to SQL function

Link to build: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/318/
See console output for more information: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/318/console

Harbormaster returned this revision to the author for changes because remote builds failed.Sep 7 2021, 3:14 PM
Harbormaster failed remote builds in B23407: Diff 22456!
swh/provenance/sql/40-funcs.sql
7

Just a question to understand how SQL manages this. This table is only visible by the current worker (the one that opened the translation), right? So two parallel processes will have their own tables, even though the name is the same?

Build is green

Patch application report for D6204 (id=22462)

Rebasing onto 9076e8a1dd...

Current branch diff-target is up to date.
Changes applied before test
commit 759d39a44c45c9351bbb14081d46fd87a4cd37d7
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Tue Sep 7 15:10:50 2021 +0200

    Use a temporary table to add relations
    
    This avoids passing a (potentially quite large) list of rows as argument
    to a function call.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/319/ for more details.

olasd requested review of this revision.Sep 7 2021, 3:23 PM
This revision is now accepted and ready to land.Sep 7 2021, 3:24 PM
swh/provenance/sql/40-funcs.sql
7

That is correct: by default, temporary tables are only visible in a single "session" (aka connection). Concurrent connections create their temporary tables in different table spaces, that don't see one another.

Creating (and dropping, with "on commit drop") the table in a transaction means the table is only ever visible in the current transaction.

This revision was automatically updated to reflect the committed changes.