Page MenuHomeSoftware Heritage

loader: Optimize SvnLoaderFromRemoteDump use on stale repository
ClosedPublic

Authored by anlambert on Nov 18 2021, 5:29 PM.

Details

Summary

When trying to load multiple times a stale subversion repository with
SvnLoaderFromRemoteDump, first check if the last loaded revision in the
archive is different from the one on the remote subversion server.

If they are identical skip the dump, mount and load phases in order
to gain some disk space and processing time on the celery worker
executing the loading task.

Related to T3719

Diff Detail

Repository
rDLDSVN Subversion (SVN) loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D6658 (id=24197)

Rebasing onto 631a2b9d7f...

Current branch diff-target is up to date.
Changes applied before test
commit 035431d539774ed54e872182f22155a62e5dc8c7
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Thu Nov 18 17:23:36 2021 +0100

    loader: Optimize SvnLoaderFromRemoteDump use on stale repository
    
    When trying to load multiple times a stale subversion repository with
    SvnLoaderFromRemoteDump, first check if the last loaded revision in the
    archive is different from the one on the remote subversion server.
    
    If they are identical skip the dump, mount and load phases in order
    to gain some disk space and processing time on the celery worker
    executing the loading task.
    
    Related to T3719

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/194/ for more details.

ardumont added a subscriber: ardumont.

Awesome!

Thanks.

This revision is now accepted and ready to land.Nov 18 2021, 5:33 PM

Update: Skip redundant post_load check when revisions are identified as identical in prepare

Build is green

Patch application report for D6658 (id=24199)

Rebasing onto 631a2b9d7f...

Current branch diff-target is up to date.
Changes applied before test
commit 0237d07b1747bf136a864444bfc38aa7330c3bee
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Thu Nov 18 17:23:36 2021 +0100

    loader: Optimize SvnLoaderFromRemoteDump use on stale repository
    
    When trying to load multiple times a stale subversion repository with
    SvnLoaderFromRemoteDump, first check if the last loaded revision in the
    archive is different from the one on the remote subversion server.
    
    If they are identical skip the dump, mount and load phases in order
    to gain some disk space and processing time on the celery worker
    executing the loading task.
    
    Related to T3719

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/195/ for more details.