REMD from deposits target a directory, with an origin in its context,
so this workaround allows indexing deposits easily, without significantly
changing swh-search.
Resolves T4694
Differential D8863
ExtrinsicMetadataIndexer: Add support for metadata with origin in context vlorentz on Nov 21 2022, 2:59 PM. Authored by Tags None Subscribers None
Details
REMD from deposits target a directory, with an origin in its context, Resolves T4694
Diff Detail
Event TimelineComment Actions Build is green Patch application report for D8863 (id=31947)Could not rebase; Attempt merge onto b7f04dd9d4... Updating b7f04dd..221d48e Fast-forward swh/indexer/metadata.py | 12 ++- swh/indexer/origin_head.py | 72 +++++++++++---- swh/indexer/tests/test_metadata.py | 13 ++- swh/indexer/tests/test_origin_head.py | 160 +++++++++++++++++++++++++++++----- 4 files changed, 210 insertions(+), 47 deletions(-) Changes applied before testcommit 221d48e242520344c6435c07d444f961637497fe Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon Nov 21 14:58:48 2022 +0100 ExtrinsicMetadataIndexer: Add support for metadata with origin in context REMD from deposits target a directory, with an origin in its context, so this workaround allows indexing deposits easily, without significantly changing swh-search. commit 03b4bb002c87e1b124edfb5e12ad09f04f3d99dd Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon Nov 21 13:38:26 2022 +0100 origin_head: Do not fetch complete snapshots for non-FTP visits Some snapshots are really large. Rather than fetching them entirely only to discard most of the branches, this commit only fetches some branches (to check existence + to use less queries on small snapshots), then requests specific branches as needed (usually only 2). This should improve performance and reduce timeout exceptions from the storage. See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/526/ for more details. |