Page MenuHomeSoftware Heritage

loader: Apply SvnLoaderFromRemoteDump optimization only for stale repos
ClosedPublic

Authored by anlambert on Nov 19 2021, 10:53 AM.

Details

Summary

If new revisions have been issued in a subversion repository since
the last loading, do not check altered history in prepare method of
SvnLoaderFromRemoteDump and proceed to incremental loading.

Depends on D6661

Diff Detail

Repository
rDLDSVN Subversion (SVN) loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D6662 (id=24203)

Could not rebase; Attempt merge onto 0237d07b17...

Updating 0237d07..0e725ed
Fast-forward
 swh/loader/svn/loader.py            |  11 ++--
 swh/loader/svn/svn.py               |  11 +++-
 swh/loader/svn/tests/test_loader.py | 106 +++++++++++++++++++++++++++++++++++-
 3 files changed, 120 insertions(+), 8 deletions(-)
Changes applied before test
commit 0e725edbc31ad1c73e87ed3fd2d0dcad51f59d82
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Thu Nov 18 23:26:10 2021 +0100

    loader: Apply SvnLoaderFromRemoteDump optimization only for stale repos
    
    If new revisions have been issued in a subversion repository since
    the last loading, do not check altered history in prepare method of
    SvnLoaderFromRemoteDump and proceed to incremental loading.

commit ed0728c8b02e893fe54c07e8260414e9b3479eb7
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Thu Nov 18 22:44:52 2021 +0100

    svn: Ensure proper incremental loading of a repository
    
    When performing an incremental loading of a subversion repository, i.e. only
    load new revisions issued since the last loading, we need to replay the whole
    set of path modifications since the first revision in order to restore possible
    file states induced by setting svn properties on those files.
    
    For instance if a file got the svn:eol-style property set in a revision lesser
    than the last one loaded into the archive, we will miss that information if
    we do not start replaying the paths modifications since the first revision.
    
    So ensure to replay path modifications since first revision and start yielding
    new data to archive once we reached the revision to resume the loading from.

commit c9aaa314df27b0ce62df1ddf9008ec7dca97ad82
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Thu Nov 18 19:58:20 2021 +0100

    loader: Ensure to reload from first revision when history got altered
    
    Previously, the reloading was performed starting revision 2.

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/198/ for more details.

This revision is now accepted and ready to land.Nov 19 2021, 11:56 AM