Page MenuHomeSoftware Heritage

replay: Use remote repository base URL to export external located in it
ClosedPublic

Authored by anlambert on Feb 7 2022, 3:31 PM.

Details

Summary

Some externals might be located in the same repository we are
currently loading. In that case, replace the external base URL
which corresponds to the origin URL by the remote repository URL.

When we use SvnLoaderFromRemoteDump, that remote repository URL
corresponds to a local repository mounted from a dump file so we
can export the external path in a much faster way without any costly
network requests.

Related to T611

Diff Detail

Repository
rDLDSVN Subversion (SVN) loader
Branch
self-external-export-optimisation
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 26582
Build 41577: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 41576: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D7099 (id=25753)

Rebasing onto 20c1445fb1...

Current branch diff-target is up to date.
Changes applied before test
commit 157b81dc7175fe3b8a34bc004ed2e1904fa7a274
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Feb 7 15:20:17 2022 +0100

    replay: Use remote repository base URL to export external located in it
    
    Some externals might be located in the same repository we are
    currently loading. In that case, replace the external base URL
    which corresponds to the origin URL by the remote repository URL.
    
    When we use SvnLoaderFromRemoteDump, that remote repository URL
    corresponds to a local repository mounted from a dump file so we
    can export the external path in a much faster way without any costly
    network requests.
    
    Related to T611

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/286/ for more details.

vlorentz added inline comments.
swh/loader/svn/replay.py
665

Looks like a false positive if external_url is "https://example.org/foo/" and self.svnrepo.origin_url is "https://example.org/foobar/"

swh/loader/svn/replay.py
665

ah right, I must add the trailing slash when checking prefix in that case, thanks

Update: Add trailing slash when testing URL prefix.

Build is green

Patch application report for D7099 (id=25765)

Rebasing onto 20c1445fb1...

Current branch diff-target is up to date.
Changes applied before test
commit 18b5d31df29bed6c0840328a54a3d06adc62346d
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Feb 7 15:20:17 2022 +0100

    replay: Use remote repository base URL to export external located in it
    
    Some externals might be located in the same repository we are
    currently loading. In that case, replace the external base URL
    which corresponds to the origin URL by the remote repository URL.
    
    When we use SvnLoaderFromRemoteDump, that remote repository URL
    corresponds to a local repository mounted from a dump file so we
    can export the external path in a much faster way without any costly
    network requests.
    
    Related to T611

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/287/ for more details.

swh/loader/svn/replay.py
665

could you add a regression test for this?

swh/loader/svn/replay.py
665

I will try, I should be able to test the exported URL by mocking the svnrepo.client object.

Update: Rebase and improve debug logging

Build is green

Patch application report for D7099 (id=25814)

Rebasing onto d929eebd39...

Current branch diff-target is up to date.
Changes applied before test
commit 1aec51eba758c517f2436ec8c59a3dee37465b10
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Feb 7 15:20:17 2022 +0100

    replay: Use remote repository base URL to export external located in it
    
    Some externals might be located in the same repository we are
    currently loading. In that case, replace the external base URL
    which corresponds to the origin URL by the remote repository URL.
    
    When we use SvnLoaderFromRemoteDump, that remote repository URL
    corresponds to a local repository mounted from a dump file so we
    can export the external path in a much faster way without any costly
    network requests.
    
    Related to T611

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/289/ for more details.

swh/loader/svn/replay.py
665

I managed to write a test checking export URLs are the expected ones.

Build is green

Patch application report for D7099 (id=25825)

Rebasing onto d929eebd39...

Current branch diff-target is up to date.
Changes applied before test
commit 0ea0d1282177a5ca7878ea61c07ffeb0d4676ec1
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Feb 7 15:20:17 2022 +0100

    replay: Use remote repository base URL to export external located in it
    
    Some externals might be located in the same repository we are
    currently loading. In that case, replace the external base URL
    which corresponds to the origin URL by the remote repository URL.
    
    When we use SvnLoaderFromRemoteDump, that remote repository URL
    corresponds to a local repository mounted from a dump file so we
    can export the external path in a much faster way without any costly
    network requests.
    
    Related to T611

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/290/ for more details.

This revision is now accepted and ready to land.Feb 9 2022, 2:03 PM