Experiment with an inline storage for snapshot branches in postgresql
Closed, MigratedEdits Locked
Actions

Assigned To

Authored By

	olasd
	Nov 6 2022, 6:47 PM

Description

Our PostgreSQL storage currently stores snapshots in three tables:

snapshot (id, internal id)
snapshot_branch (internal id, name, target, target type)
snapshot_branches (snapshot internal id, snapshot_branch internal id)

For large snapshots, this makes paginated queries (depending on the branch name) very inefficient: PostgreSQL has to fetch all branches and sort them out before outputting results.

We should experiment with a model equivalent to that of Cassandra, with one snapshot table and an inline snapshot_branches table that we can index using (snapshot id, branch name).

Event Timeline

Experiments in terms of space:

09:12 guest@softwareheritage => select snapshot_id, sb.name, sb.target_type, sb.target into temp table snapshot_branches_inline from snapshot_branches sbs inner join snapshot_branch sb on sbs.branch_id = sb.object_id; 
SELECT 5580580124
Durée : 22446790,638 ms (06:14:06,791)
15:27 guest@softwareheritage => \dt+ snapshot_branches_inline
                                                 Liste des relations
   Schéma   │           Nom            │ Type  │ Propriétaire │ Persistence │ Méthode d'accès │ Taille │ Description 
────────────┼──────────────────────────┼───────┼──────────────┼─────────────┼─────────────────┼────────┼─────────────
 pg_temp_36 │ snapshot_branches_inline │ table │ guest        │ temporaire  │ heap            │ 476 GB │ 
(1 ligne)

00:05 guest@softwareheritage => \dt+ snapshot_branches
                                                           Liste des relations
 Schéma │        Nom        │ Type  │ Propriétaire │ Persistence │ Méthode d'accès │ Taille │                 Description                 
────────┼───────────────────┼───────┼──────────────┼─────────────┼─────────────────┼────────┼─────────────────────────────────────────────
 public │ snapshot_branches │ table │ swhstorage   │ permanent   │ heap            │ 231 GB │ Mapping between snapshot and their branches
(1 ligne)

00:05 guest@softwareheritage => \dt+ snapshot_branch
                                                                Liste des relations
 Schéma │       Nom       │ Type  │ Propriétaire │ Persistence │ Méthode d'accès │ Taille │                       Description                       
────────┼─────────────────┼───────┼──────────────┼─────────────┼─────────────────┼────────┼─────────────────────────────────────────────────────────
 public │ snapshot_branch │ table │ swhstorage   │ permanent   │ heap            │ 52 GB  │ Associates branches with objects in Heritage Merkle DAG
(1 ligne)

Trying to add some indexes now to be able to usefully compare storage sizes.

This task has been migrated to GitLab.

Experiment with an inline storage for snapshot branches in postgresqlClosed, MigratedEdits LockedActions

Description

Event Timeline

Experiment with an inline storage for snapshot branches in postgresql
Closed, MigratedEdits Locked
Actions