Page MenuHomeSoftware Heritage

Experiment with an inline storage for snapshot branches in postgresql
Closed, MigratedEdits Locked

Description

Our PostgreSQL storage currently stores snapshots in three tables:

  • snapshot (id, internal id)
  • snapshot_branch (internal id, name, target, target type)
  • snapshot_branches (snapshot internal id, snapshot_branch internal id)

For large snapshots, this makes paginated queries (depending on the branch name) very inefficient: PostgreSQL has to fetch all branches and sort them out before outputting results.

We should experiment with a model equivalent to that of Cassandra, with one snapshot table and an inline snapshot_branches table that we can index using (snapshot id, branch name).

Event Timeline

olasd triaged this task as Normal priority.Nov 6 2022, 6:47 PM
olasd created this task.

Experiments in terms of space:

09:12 guest@softwareheritage => select snapshot_id, sb.name, sb.target_type, sb.target into temp table snapshot_branches_inline from snapshot_branches sbs inner join snapshot_branch sb on sbs.branch_id = sb.object_id; 
SELECT 5580580124
Durée : 22446790,638 ms (06:14:06,791)
15:27 guest@softwareheritage => \dt+ snapshot_branches_inline
                                                 Liste des relations
   Schéma   │           Nom            │ Type  │ Propriétaire │ Persistence │ Méthode d'accès │ Taille │ Description 
────────────┼──────────────────────────┼───────┼──────────────┼─────────────┼─────────────────┼────────┼─────────────
 pg_temp_36 │ snapshot_branches_inline │ table │ guest        │ temporaire  │ heap            │ 476 GB │ 
(1 ligne)

00:05 guest@softwareheritage => \dt+ snapshot_branches
                                                           Liste des relations
 Schéma │        Nom        │ Type  │ Propriétaire │ Persistence │ Méthode d'accès │ Taille │                 Description                 
────────┼───────────────────┼───────┼──────────────┼─────────────┼─────────────────┼────────┼─────────────────────────────────────────────
 public │ snapshot_branches │ table │ swhstorage   │ permanent   │ heap            │ 231 GB │ Mapping between snapshot and their branches
(1 ligne)

00:05 guest@softwareheritage => \dt+ snapshot_branch
                                                                Liste des relations
 Schéma │       Nom       │ Type  │ Propriétaire │ Persistence │ Méthode d'accès │ Taille │                       Description                       
────────┼─────────────────┼───────┼──────────────┼─────────────┼─────────────────┼────────┼─────────────────────────────────────────────────────────
 public │ snapshot_branch │ table │ swhstorage   │ permanent   │ heap            │ 52 GB  │ Associates branches with objects in Heritage Merkle DAG
(1 ligne)

Trying to add some indexes now to be able to usefully compare storage sizes.