REMD from deposits target a directory, with an origin in its context,
so this workaround allows indexing deposits easily, without significantly
changing swh-search.
Resolves T4694
Differential D8863
ExtrinsicMetadataIndexer: Add support for metadata with origin in context Authored by vlorentz on Nov 21 2022, 2:59 PM. Tags None Subscribers None
Details
REMD from deposits target a directory, with an origin in its context, Resolves T4694
Diff Detail
Event TimelineComment Actions Build is green Patch application report for D8863 (id=31947)Could not rebase; Attempt merge onto b7f04dd9d4... Updating b7f04dd..221d48e Fast-forward swh/indexer/metadata.py | 12 ++- swh/indexer/origin_head.py | 72 +++++++++++---- swh/indexer/tests/test_metadata.py | 13 ++- swh/indexer/tests/test_origin_head.py | 160 +++++++++++++++++++++++++++++----- 4 files changed, 210 insertions(+), 47 deletions(-) Changes applied before testcommit 221d48e242520344c6435c07d444f961637497fe
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon Nov 21 14:58:48 2022 +0100
ExtrinsicMetadataIndexer: Add support for metadata with origin in context
REMD from deposits target a directory, with an origin in its context,
so this workaround allows indexing deposits easily, without significantly
changing swh-search.
commit 03b4bb002c87e1b124edfb5e12ad09f04f3d99dd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon Nov 21 13:38:26 2022 +0100
origin_head: Do not fetch complete snapshots for non-FTP visits
Some snapshots are really large. Rather than fetching them entirely only to
discard most of the branches, this commit only fetches some branches (to
check existence + to use less queries on small snapshots), then requests
specific branches as needed (usually only 2).
This should improve performance and reduce timeout exceptions from the
storage.See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/526/ for more details. |