HomeSoftware Heritage

Make the Mercurial loader incremental

Description

Make the Mercurial loader incremental

Before this change, if a previous snapshot of a given origin existed
and new changes were detected, we would start from scratch.

This change leverages the recent new db mapping for external ids (like
Mercurial's node ids) to internal SWH ids to compute what has changed
from the latest snapshot, now that it is possible to find an SWH id from
a Mercurial node id.

For revisions, the logic is simple: look at the heads we've saved and
ask Mercurial for all the revisions that are not ancestors of these
heads (themselves excluded). This is not as "clever" as the full
Mercurial discovery algorithm, but is much simpler and good enough for
the kinds of scales we're operating at on a single repository.

For tags, the previous logic assumed that all possible target revisions
were done in the same run. Here, we look at the difference between the
tags Mercurial reports and the one form the previous snapshot; any new
tag will either have its corresponding release in cache (because it was
processed in the same run) or fetched from the database using the
aforementioned mapping.

Details

Provenance
AlphareAuthored on May 5 2021, 5:23 PM
AlpharePushed on May 18 2021, 11:16 AM
Differential Revision
D5687: Make the Mercurial loader incremental
Parents
rDLDHG773d872a8164: Move `os.environ` manipulation to pre_cleanup
Branches
Unknown
Tags
Unknown
Build Status
Buildable 21521
Build 33434: test-and-buildJenkins console · Jenkins