Page MenuHomeSoftware Heritage

loader: Fix repo sub-tree loading when using dump loaders
ClosedPublic

Authored by anlambert on Feb 11 2022, 4:49 PM.

Details

Summary

When dumping a repository sub-tree using svnrdump, svnrdump filters the
repository paths outside of the sub-tree but still dumps all commits of
the root repository. This means that the produced dump might contain
empty commits if those modify paths outside of the sub-tree.

So ensure to have the same loading behavior as with SvnLoader class,
that communicates directly with the remote repository, when loading
a repository sub-tree from a remote dump. These changes ensure no
empty commits will be archived and correct root directory when loading
a sub-tree with SvnLoaderFromDumpArchive or SvnLoaderFromRemoteDump.

Related to T3896

Depends on D7137

Diff Detail

Repository
rDLDSVN Subversion (SVN) loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D7163 (id=25963)

Could not rebase; Attempt merge onto e7c3fa08d9...

Updating e7c3fa0..4c50584
Fast-forward
 swh/loader/svn/loader.py               |  57 ++++++++---------
 swh/loader/svn/svn.py                  |  38 +++++++++---
 swh/loader/svn/tests/test_externals.py |   5 ++
 swh/loader/svn/tests/test_loader.py    | 109 +++++++++++++++++++++++++++++++--
 4 files changed, 165 insertions(+), 44 deletions(-)
Changes applied before test
commit 4c505845c2198f475bbc1ea75dc49a914282b3ce
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Thu Feb 10 14:27:48 2022 +0100

    loader: Fix repo sub-tree loading when using dump loaders
    
    When dumping a repository sub-tree using svnrdump, svnrdump filters the
    repository paths outside of the sub-tree but still dumps all commits of
    the root repository. This means that the produced dump might contain
    empty commits if those modify paths outside of the sub-tree.
    
    So ensure to have the same loading behavior as with SvnLoader class,
    that communicates directly with the remote repository, when loading
    a repository sub-tree from a remote dump. These changes ensure no
    empty commits will be archived and correct root directory when loading
    a sub-tree with SvnLoaderFromDumpArchive or SvnLoaderFromRemoteDump.
    
    Related to T3896

commit 10b8ce86411351b4af6a5759e6b05b582dcf2ce4
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Feb 9 16:05:35 2022 +0100

    loader: Simplify the handing of parent revision
    
    In the current loader implementation, a revision different from the
    first one has a single parent revision corresponding to the previously
    processed one.
    
    In order to simplify the handling of the parent revision, remove
    the use of the revision_parents dict and simply store the previously
    processed revision id in a parents tuple variable while iterating on
    the revisions log. In case of incremental loading, that tuple will be
    initialized from the latest revision loaded into the archive.
    
    This change is required to allow the loading of svn subprojects.
    
    Related to T3896

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/303/ for more details.

Simplify test implementation

Build is green

Patch application report for D7163 (id=25964)

Could not rebase; Attempt merge onto e7c3fa08d9...

Updating e7c3fa0..8639d84
Fast-forward
 swh/loader/svn/loader.py               |  57 ++++++++---------
 swh/loader/svn/svn.py                  |  38 +++++++++---
 swh/loader/svn/tests/test_externals.py |   5 ++
 swh/loader/svn/tests/test_loader.py    | 109 +++++++++++++++++++++++++++++++--
 4 files changed, 165 insertions(+), 44 deletions(-)
Changes applied before test
commit 8639d8440fd9a76b7cedae8a981f63371e49b561
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Thu Feb 10 14:27:48 2022 +0100

    loader: Fix repo sub-tree loading when using dump loaders
    
    When dumping a repository sub-tree using svnrdump, svnrdump filters the
    repository paths outside of the sub-tree but still dumps all commits of
    the root repository. This means that the produced dump might contain
    empty commits if those modify paths outside of the sub-tree.
    
    So ensure to have the same loading behavior as with SvnLoader class,
    that communicates directly with the remote repository, when loading
    a repository sub-tree from a remote dump. These changes ensure no
    empty commits will be archived and correct root directory when loading
    a sub-tree with SvnLoaderFromDumpArchive or SvnLoaderFromRemoteDump.
    
    Related to T3896

commit 10b8ce86411351b4af6a5759e6b05b582dcf2ce4
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Feb 9 16:05:35 2022 +0100

    loader: Simplify the handing of parent revision
    
    In the current loader implementation, a revision different from the
    first one has a single parent revision corresponding to the previously
    processed one.
    
    In order to simplify the handling of the parent revision, remove
    the use of the revision_parents dict and simply store the previously
    processed revision id in a parents tuple variable while iterating on
    the revisions log. In case of incremental loading, that tuple will be
    initialized from the latest revision loaded into the archive.
    
    This change is required to allow the loading of svn subprojects.
    
    Related to T3896

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/304/ for more details.

ardumont added inline comments.
swh/loader/svn/svn.py
87

add a comment to explain what these instructions are aiming at... (determine the root directly out of the origin url or something)

This revision is now accepted and ready to land.Feb 11 2022, 5:11 PM

Add comment about why we need to compute root repository path

Build is green

Patch application report for D7163 (id=25968)

Could not rebase; Attempt merge onto e7c3fa08d9...

Updating e7c3fa0..eecddf1
Fast-forward
 swh/loader/svn/loader.py               |  57 ++++++++---------
 swh/loader/svn/svn.py                  |  40 +++++++++---
 swh/loader/svn/tests/test_externals.py |   5 ++
 swh/loader/svn/tests/test_loader.py    | 109 +++++++++++++++++++++++++++++++--
 4 files changed, 167 insertions(+), 44 deletions(-)
Changes applied before test
commit eecddf13d5bb6dd5fb5701851ff282c776a4f6f1
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Thu Feb 10 14:27:48 2022 +0100

    loader: Fix repo sub-tree loading when using dump loaders
    
    When dumping a repository sub-tree using svnrdump, svnrdump filters the
    repository paths outside of the sub-tree but still dumps all commits of
    the root repository. This means that the produced dump might contain
    empty commits if those modify paths outside of the sub-tree.
    
    So ensure to have the same loading behavior as with SvnLoader class,
    that communicates directly with the remote repository, when loading
    a repository sub-tree from a remote dump. These changes ensure no
    empty commits will be archived and correct root directory when loading
    a sub-tree with SvnLoaderFromDumpArchive or SvnLoaderFromRemoteDump.
    
    Related to T3896

commit 10b8ce86411351b4af6a5759e6b05b582dcf2ce4
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Feb 9 16:05:35 2022 +0100

    loader: Simplify the handing of parent revision
    
    In the current loader implementation, a revision different from the
    first one has a single parent revision corresponding to the previously
    processed one.
    
    In order to simplify the handling of the parent revision, remove
    the use of the revision_parents dict and simply store the previously
    processed revision id in a parents tuple variable while iterating on
    the revisions log. In case of incremental loading, that tuple will be
    initialized from the latest revision loaded into the archive.
    
    This change is required to allow the loading of svn subprojects.
    
    Related to T3896

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/305/ for more details.