Page MenuHomeSoftware Heritage

ExtrinsicMetadataIndexer: Add support for metadata with origin in context
ClosedPublic

Authored by vlorentz on Nov 21 2022, 2:59 PM.

Details

Summary

REMD from deposits target a directory, with an origin in its context,
so this workaround allows indexing deposits easily, without significantly
changing swh-search.

Resolves T4694

Diff Detail

Repository
rDCIDX Metadata indexer
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D8863 (id=31947)

Could not rebase; Attempt merge onto b7f04dd9d4...

Updating b7f04dd..221d48e
Fast-forward
 swh/indexer/metadata.py               |  12 ++-
 swh/indexer/origin_head.py            |  72 +++++++++++----
 swh/indexer/tests/test_metadata.py    |  13 ++-
 swh/indexer/tests/test_origin_head.py | 160 +++++++++++++++++++++++++++++-----
 4 files changed, 210 insertions(+), 47 deletions(-)
Changes applied before test
commit 221d48e242520344c6435c07d444f961637497fe
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Nov 21 14:58:48 2022 +0100

    ExtrinsicMetadataIndexer: Add support for metadata with origin in context
    
    REMD from deposits target a directory, with an origin in its context,
    so this workaround allows indexing deposits easily, without significantly
    changing swh-search.

commit 03b4bb002c87e1b124edfb5e12ad09f04f3d99dd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Nov 21 13:38:26 2022 +0100

    origin_head: Do not fetch complete snapshots for non-FTP visits
    
    Some snapshots are really large. Rather than fetching them entirely only to
    discard most of the branches, this commit only fetches some branches (to
    check existence + to use less queries on small snapshots), then requests
    specific branches as needed (usually only 2).
    
    This should improve performance and reduce timeout exceptions from the
    storage.

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/526/ for more details.

This revision is now accepted and ready to land.Nov 22 2022, 5:07 PM