When a repository as new commits only load the new ones.
Details
- Reviewers
douardda acezar - Group Reviewers
Reviewers - Maniphest Tasks
- T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders
Diff Detail
- Repository
- rDLDHG Mercurial loader
- Branch
- D4643
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 17674 Build 27328: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 27327: arc lint + arc unit
Event Timeline
Build is green
Patch application report for D4649 (id=16488)
Could not rebase; Attempt merge onto 4bf91cff72...
Updating 4bf91cf..a110ee9 Fast-forward swh/loader/mercurial/from_disk.py | 88 +++++++++++++++++++++++++--- swh/loader/mercurial/tests/test_from_disk.py | 80 +++++++++++++++++++++++++ 2 files changed, 161 insertions(+), 7 deletions(-)
Changes applied before test
commit a110ee9e907e8356a5583b5a874d0285f6312539 Author: Antoine Cezar <antoine.cezar@octobus.net> Date: Wed Dec 2 14:47:18 2020 +0100 HgLoaderFromDisk: Only load new commits When a repository as new commits only load the new ones. commit c0ac9e2e4bbbd6bf0a6c09517a532708db824df1 Author: Antoine Cezar <antoine.cezar@octobus.net> Date: Mon Nov 30 14:29:47 2020 +0100 Make loading an unchanged repository uneventful Summary: By looking at the previous snapshot heads, loading of an unchanged repository will be uneventful. Reviewers: #reviewers Differential Revision: https://forge.softwareheritage.org/D4643
See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/133/ for more details.
swh/loader/mercurial/from_disk.py | ||
---|---|---|
368 | I don't really understand why this is needed. I mean isn't it just a matter of properly sort revisions to add to ensure parents revs are always handled before? ie make sure ingested revisions graph is traversed "properly". I mean if this is needed, then there is recursion store_revisions -> get_revisions_parents -> store_revision [...] and there is no guarantee this recursion will stay low enough for python to handle it (-> RecursionError) . So to prevent that, the revision graph must be traversed properly, then this special case (resulting in possible recursion) is not needed anymore. |
Followup
swh/loader/mercurial/from_disk.py | ||
---|---|---|
368 | It was to get missing parents until they can be fetched from the storage, but since I know some of them from the latest snapshot, I have pre-populated self._revision_nodeid_to_swhid instead. |
Build was aborted
Patch application report for D4649 (id=16566)
Could not rebase; Attempt merge onto 4bf91cff72...
Updating 4bf91cf..c89f1b7 Fast-forward swh/loader/mercurial/from_disk.py | 92 +++++++++++++++++++++++++--- swh/loader/mercurial/hgutil.py | 4 +- swh/loader/mercurial/tests/test_from_disk.py | 80 ++++++++++++++++++++++++ 3 files changed, 168 insertions(+), 8 deletions(-)
Changes applied before test
commit c89f1b729ea862082a51e73eefaa3023ef92cf62 Author: Antoine Cezar <antoine.cezar@octobus.net> Date: Wed Dec 2 14:47:18 2020 +0100 HgLoaderFromDisk: Only load new commits When a repository as new commits only load the new ones. commit 7af11ca8dcaaec1b65250bfbdb9a9d43d3c588cf Author: Antoine Cezar <antoine.cezar@octobus.net> Date: Mon Nov 30 14:29:47 2020 +0100 HgLoaderFromDisk: uneventful load when unchanged By looking at the previous snapshot heads, loading of an unchanged repository will be uneventful.
Link to build: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/135/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/135/console
swh/loader/mercurial/from_disk.py | ||
---|---|---|
368 | this still looks wrong to me. @marmoute can you have a look please? In which situation can this sha1_git be None? I think this would happen in the situation where:
Am I right? If so, we cannot implement this without a "working" get_revision_id_from_hg_nodeid(). |
swh/loader/mercurial/from_disk.py | ||
---|---|---|
368 | Example of get_revision_id_from_hg_nodeid returning None: The repository is usable but has some corruption that prevent loading part of its revisions. This is the case for pypy for example. Not an issue since the rest of the repository is ok to work with as long that you don't use the corrupted part. But when a revision is corrupted all its descendants will fail to get their parent id has they cannot be loaded from the storage or the cache. I'm preparing a diff that will avoid failing of the entire load when there are some corruption by not loading the corrupted revisions. And it will imply that get_revision_id_from_hg_nodeid can return None or raise a specific exception. |