Page MenuHomeSoftware Heritage

origin_head: Do not fetch complete snapshots for non-FTP visits
ClosedPublic

Authored by vlorentz on Nov 21 2022, 1:46 PM.

Details

Summary

Some snapshots are really large. Rather than fetching them entirely only to
discard most of the branches, this commit only fetches some branches (to
check existence + to use less queries on small snapshots), then requests
specific branches as needed (usually only 2).

This should improve performance and reduce timeout exceptions from the
storage.

Diff Detail

Repository
rDCIDX Metadata indexer
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 32865
Build 51502: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 51501: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D8861 (id=31937)

Rebasing onto b7f04dd9d4...

Current branch diff-target is up to date.
Changes applied before test
commit 03b4bb002c87e1b124edfb5e12ad09f04f3d99dd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Nov 21 13:38:26 2022 +0100

    origin_head: Do not fetch complete snapshots for non-FTP visits
    
    Some snapshots are really large. Rather than fetching them entirely only to
    discard most of the branches, this commit only fetches some branches (to
    check existence + to use less queries on small snapshots), then requests
    specific branches as needed (usually only 2).
    
    This should improve performance and reduce timeout exceptions from the
    storage.

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/525/ for more details.

This revision is now accepted and ready to land.Nov 21 2022, 2:47 PM