Page MenuHomeSoftware Heritage

ra: Put externals in cache to avoid exporting them again
ClosedPublic

Authored by anlambert on Jan 12 2022, 2:57 PM.

Details

Summary

Some subversion repositories can define same external on different paths.

In order to avoid exporting it multiple times, which consumes network bandwith
and slows down the loading, save the exported external in a temporary directory
on the local filesystem and reuse that copy when the external is set on a path.

Also ensure all the temporary directories created for externals will be deleted
at the end of the loading process.

Related to T611

Depends on D6895

Diff Detail

Repository
rDLDSVN Subversion (SVN) loader
Branch
svn-loader-external-cache
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 25973
Build 40595: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 40594: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D6925 (id=25099)

Could not rebase; Attempt merge onto 93b4f2fdd8...

Updating 93b4f2f..2d3cfc8
Fast-forward
 swh/loader/svn/ra.py                | 243 +++++++++++++-
 swh/loader/svn/svn.py               |  33 +-
 swh/loader/svn/tests/test_loader.py | 635 +++++++++++++++++++++++++++++++-----
 swh/loader/svn/tests/test_utils.py  | 197 +++++++++++
 swh/loader/svn/utils.py             | 118 +++++++
 5 files changed, 1117 insertions(+), 109 deletions(-)
Changes applied before test
commit 2d3cfc82e06fb03e9ace811658dc9ae6b6662be4
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Jan 11 20:09:56 2022 +0100

    ra: Put externals in cache to avoid exporting them again
    
    Some subversion repositories can define same external on different paths.
    
    In order to avoid exporting it multiple times, which consumes network bandwith
    and slows down the loading, save the exported external in a temporary directory
    on the local filesystem and reuse that copy when the external is set on a path.
    
    Also ensure all the temporary directories created for externals will be deleted
    at the end of the loading process.
    
    Related to T611

commit 4db03a5e5cd623f9178e23f093cb54b4e78903e3
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Dec 8 11:43:38 2021 +0100

    ra: Add support for subversion external definitions
    
    Subversion external definitions set on directories through the use of the
    svn:externals property are now handled by the loader.
    
    As with a svn export operation, externals will be attempted to be exported
    in the paths they are defined. If an external is no longer valid (404),
    the error will be ignored and the next one will be processed.
    
    The implementation takes care of keeping the reconstructed repository
    filesystem for a revision in sync with a svn export operation while
    externals are added, updated or removed across revisions replay.
    
    Related to T611

commit 2e7a0c23200404d815601dbb83bdb5bcc3c6c4be
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Dec 14 17:39:07 2021 +0100

    utils: Add a function to parse a subversion external definition
    
    Add a function to parse an external definition according to official
    specifications in order to extract or compute:
    
      - the relative path where the external should be exported
    
      - the URL of the external
    
      - the optional revision of the external to export
    
    Related to T611

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/231/ for more details.

vlorentz added inline comments.
swh/loader/svn/ra.py
612–617

does the loader ever mutate files? if not, you can use os.link instead of shutil.copy, and pass copy_function=os.link to shutil.copytree, to create hard links instead of actual copies, which should save some time and space

actually, forget it. it's a premature optimization and it may cause corruption issues if not carefully handled.

This revision is now accepted and ready to land.Jan 13 2022, 12:36 PM

Build is green

Patch application report for D6925 (id=25169)

Could not rebase; Attempt merge onto 93b4f2fdd8...

Updating 93b4f2f..d19f08e
Fast-forward
 swh/loader/svn/ra.py                | 259 ++++++++++++-
 swh/loader/svn/svn.py               |  50 ++-
 swh/loader/svn/tests/test_loader.py | 705 +++++++++++++++++++++++++++++++-----
 swh/loader/svn/tests/test_utils.py  | 225 +++++++++++-
 swh/loader/svn/utils.py             | 126 ++++++-
 5 files changed, 1252 insertions(+), 113 deletions(-)
Changes applied before test
commit d19f08e3fd5f9d803300f6be698dedf30bd5f527
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Jan 11 20:09:56 2022 +0100

    ra: Put externals in cache to avoid exporting them again
    
    Some subversion repositories can define same external on different paths.
    
    In order to avoid exporting it multiple times, which consumes network bandwith
    and slows down the loading, save the exported external in a temporary directory
    on the local filesystem and reuse that copy when the external is set on a path.
    
    Also ensure all the temporary directories created for externals will be deleted
    at the end of the loading process.
    
    Related to T611

commit 85aa87be50fea493c437b44cbcc544f285912e5d
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Dec 8 11:43:38 2021 +0100

    ra: Add support for subversion external definitions
    
    Subversion external definitions set on directories through the use of the
    svn:externals property are now handled by the loader.
    
    As with a svn export operation, externals will be attempted to be exported
    in the paths they are defined. If an external is no longer valid (404),
    the error will be ignored and the next one will be processed.
    
    The implementation takes care of keeping the reconstructed repository
    filesystem for a revision in sync with a svn export operation while
    externals are added, updated or removed across revisions replay.
    
    Related to T611

commit 2f5fd60ab91f5af90c3333f369cb7b72b28b3fbe
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Dec 14 17:39:07 2021 +0100

    utils: Add a function to parse a subversion external definition
    
    Add a function to parse an external definition according to official
    specifications in order to extract or compute:
    
      - the relative path where the external should be exported
    
      - the URL of the external
    
      - the optional revision of the external to export
    
    Related to T611

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/235/ for more details.

Build is green

Patch application report for D6925 (id=25173)

Could not rebase; Attempt merge onto 93b4f2fdd8...

Updating 93b4f2f..f3d3eaf
Fast-forward
 swh/loader/svn/ra.py                | 259 ++++++++++++-
 swh/loader/svn/svn.py               |  49 ++-
 swh/loader/svn/tests/test_loader.py | 708 +++++++++++++++++++++++++++++++-----
 swh/loader/svn/tests/test_utils.py  | 225 +++++++++++-
 swh/loader/svn/utils.py             | 126 ++++++-
 5 files changed, 1253 insertions(+), 114 deletions(-)
Changes applied before test
commit f3d3eafe017a971952fb1359f19ee1f269edaf4f
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Jan 11 20:09:56 2022 +0100

    ra: Put externals in cache to avoid exporting them again
    
    Some subversion repositories can define same external on different paths.
    
    In order to avoid exporting it multiple times, which consumes network bandwith
    and slows down the loading, save the exported external in a temporary directory
    on the local filesystem and reuse that copy when the external is set on a path.
    
    Also ensure all the temporary directories created for externals will be deleted
    at the end of the loading process.
    
    Related to T611

commit 8c7046e0ab03ae4b8f93134ff0d85d80c4352c69
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Dec 8 11:43:38 2021 +0100

    ra: Add support for subversion external definitions
    
    Subversion external definitions set on directories through the use of the
    svn:externals property are now handled by the loader.
    
    As with a svn export operation, externals will be attempted to be exported
    in the paths they are defined. If an external is no longer valid (404),
    the error will be ignored and the next one will be processed.
    
    The implementation takes care of keeping the reconstructed repository
    filesystem for a revision in sync with a svn export operation while
    externals are added, updated or removed across revisions replay.
    
    Related to T611

commit 2f5fd60ab91f5af90c3333f369cb7b72b28b3fbe
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Dec 14 17:39:07 2021 +0100

    utils: Add a function to parse a subversion external definition
    
    Add a function to parse an external definition according to official
    specifications in order to extract or compute:
    
      - the relative path where the external should be exported
    
      - the URL of the external
    
      - the optional revision of the external to export
    
    Related to T611

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/238/ for more details.

Build is green

Patch application report for D6925 (id=25177)

Could not rebase; Attempt merge onto 93b4f2fdd8...

Updating 93b4f2f..fe9fc90
Fast-forward
 swh/loader/svn/ra.py                | 259 ++++++++++++-
 swh/loader/svn/svn.py               |  52 ++-
 swh/loader/svn/tests/test_loader.py | 708 +++++++++++++++++++++++++++++++-----
 swh/loader/svn/tests/test_utils.py  | 225 +++++++++++-
 swh/loader/svn/utils.py             | 126 ++++++-
 5 files changed, 1256 insertions(+), 114 deletions(-)
Changes applied before test
commit fe9fc903e547ed6f00a8ba2cc4642faa54427989
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Jan 11 20:09:56 2022 +0100

    ra: Put externals in cache to avoid exporting them again
    
    Some subversion repositories can define same external on different paths.
    
    In order to avoid exporting it multiple times, which consumes network bandwith
    and slows down the loading, save the exported external in a temporary directory
    on the local filesystem and reuse that copy when the external is set on a path.
    
    Also ensure all the temporary directories created for externals will be deleted
    at the end of the loading process.
    
    Related to T611

commit e76adb16d51b726eb0a3c5b266579eeba566f48d
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Dec 8 11:43:38 2021 +0100

    ra: Add support for subversion external definitions
    
    Subversion external definitions set on directories through the use of the
    svn:externals property are now handled by the loader.
    
    As with a svn export operation, externals will be attempted to be exported
    in the paths they are defined. If an external is no longer valid (404),
    the error will be ignored and the next one will be processed.
    
    The implementation takes care of keeping the reconstructed repository
    filesystem for a revision in sync with a svn export operation while
    externals are added, updated or removed across revisions replay.
    
    Related to T611

commit 2f5fd60ab91f5af90c3333f369cb7b72b28b3fbe
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Dec 14 17:39:07 2021 +0100

    utils: Add a function to parse a subversion external definition
    
    Add a function to parse an external definition according to official
    specifications in order to extract or compute:
    
      - the relative path where the external should be exported
    
      - the URL of the external
    
      - the optional revision of the external to export
    
    Related to T611

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/241/ for more details.

Update: Preserve symlinks when copying an external tree.

Build is green

Patch application report for D6925 (id=25230)

Could not rebase; Attempt merge onto cb4bf60c0e...

Merge made by the 'recursive' strategy.
 swh/loader/svn/ra.py                | 261 ++++++++++++-
 swh/loader/svn/svn.py               |  52 ++-
 swh/loader/svn/tests/test_loader.py | 708 +++++++++++++++++++++++++++++++-----
 swh/loader/svn/tests/test_utils.py  | 261 ++++++++++++-
 swh/loader/svn/utils.py             | 128 ++++++-
 5 files changed, 1296 insertions(+), 114 deletions(-)
Changes applied before test
commit 7f1d0a11d831abfb4984dbaa2422e13bf56f264d
Merge: cb4bf60 76a33cb
Author: Jenkins user <jenkins@localhost>
Date:   Tue Jan 18 10:01:02 2022 +0000

    Merge branch 'diff-target' into HEAD

commit 76a33cb8f2bcc33129c8c8df160ae21d46767284
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Jan 11 20:09:56 2022 +0100

    ra: Put externals in cache to avoid exporting them again
    
    Some subversion repositories can define same external on different paths.
    
    In order to avoid exporting it multiple times, which consumes network bandwith
    and slows down the loading, save the exported external in a temporary directory
    on the local filesystem and reuse that copy when the external is set on a path.
    
    Also ensure all the temporary directories created for externals will be deleted
    at the end of the loading process.
    
    Related to T611

commit aba4e6e29600cc6b4e4a4c686ff4946b5ce9b077
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Dec 8 11:43:38 2021 +0100

    ra: Add support for subversion external definitions
    
    Subversion external definitions set on directories through the use of the
    svn:externals property are now handled by the loader.
    
    As with a svn export operation, externals will be attempted to be exported
    in the paths they are defined. If an external is no longer valid (404),
    the error will be ignored and the next one will be processed.
    
    The implementation takes care of keeping the reconstructed repository
    filesystem for a revision in sync with a svn export operation while
    externals are added, updated or removed across revisions replay.
    
    Related to T611

commit 30b3c8427391edc0f88dce202dcb8abb07bf548b
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Dec 14 17:39:07 2021 +0100

    utils: Add a function to parse a subversion external definition
    
    Add a function to parse an external definition according to official
    specifications in order to extract or compute:
    
      - the relative path where the external should be exported
    
      - the URL of the external
    
      - the optional revision of the external to export
    
    Related to T611

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/247/ for more details.

Build is green

Patch application report for D6925 (id=25239)

Could not rebase; Attempt merge onto cb4bf60c0e...

Updating cb4bf60..aea0e4e
Fast-forward
 swh/loader/svn/ra.py                | 261 ++++++++++++-
 swh/loader/svn/svn.py               |  52 ++-
 swh/loader/svn/tests/test_loader.py | 708 +++++++++++++++++++++++++++++++-----
 swh/loader/svn/tests/test_utils.py  | 261 ++++++++++++-
 swh/loader/svn/utils.py             | 128 ++++++-
 5 files changed, 1296 insertions(+), 114 deletions(-)
Changes applied before test
commit aea0e4e4216c2a8d43dbeed65208df232723cfd0
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Jan 11 20:09:56 2022 +0100

    ra: Put externals in cache to avoid exporting them again
    
    Some subversion repositories can define same external on different paths.
    
    In order to avoid exporting it multiple times, which consumes network bandwith
    and slows down the loading, save the exported external in a temporary directory
    on the local filesystem and reuse that copy when the external is set on a path.
    
    Also ensure all the temporary directories created for externals will be deleted
    at the end of the loading process.
    
    Related to T611

commit 13eb16e499e79b7a5914af4961cdc5afeea9eada
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Dec 8 11:43:38 2021 +0100

    ra: Add support for subversion external definitions
    
    Subversion external definitions set on directories through the use of the
    svn:externals property are now handled by the loader.
    
    As with a svn export operation, externals will be attempted to be exported
    in the paths they are defined. If an external is no longer valid (404),
    the error will be ignored and the next one will be processed.
    
    The implementation takes care of keeping the reconstructed repository
    filesystem for a revision in sync with a svn export operation while
    externals are added, updated or removed across revisions replay.
    
    Related to T611

commit f1913512a5faa0c99d23607b9d63fc6003c729fb
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Dec 14 17:39:07 2021 +0100

    utils: Add a function to parse a subversion external definition
    
    Add a function to parse an external definition according to official
    specifications in order to extract or compute:
    
      - the relative path where the external should be exported
    
      - the URL of the external
    
      - the optional revision of the external to export
    
    Related to T611

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/251/ for more details.

Build is green

Patch application report for D6925 (id=25242)

Could not rebase; Attempt merge onto cb4bf60c0e...

Updating cb4bf60..cf19926
Fast-forward
 swh/loader/svn/ra.py                | 260 ++++++++++++-
 swh/loader/svn/svn.py               |  52 ++-
 swh/loader/svn/tests/test_loader.py | 708 +++++++++++++++++++++++++++++++-----
 swh/loader/svn/tests/test_utils.py  | 261 ++++++++++++-
 swh/loader/svn/utils.py             | 128 ++++++-
 5 files changed, 1295 insertions(+), 114 deletions(-)
Changes applied before test
commit cf199264266935238d7fb93c5d5f754edce7c589
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Jan 11 20:09:56 2022 +0100

    ra: Put externals in cache to avoid exporting them again
    
    Some subversion repositories can define same external on different paths.
    
    In order to avoid exporting it multiple times, which consumes network bandwith
    and slows down the loading, save the exported external in a temporary directory
    on the local filesystem and reuse that copy when the external is set on a path.
    
    Also ensure all the temporary directories created for externals will be deleted
    at the end of the loading process.
    
    Related to T611

commit 9dd1921a9935d118bd97645b2f6844b14db536db
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Dec 8 11:43:38 2021 +0100

    ra: Add support for subversion external definitions
    
    Subversion external definitions set on directories through the use of the
    svn:externals property are now handled by the loader.
    
    As with a svn export operation, externals will be attempted to be exported
    in the paths they are defined. If an external is no longer valid (404),
    the error will be ignored and the next one will be processed.
    
    The implementation takes care of keeping the reconstructed repository
    filesystem for a revision in sync with a svn export operation while
    externals are added, updated or removed across revisions replay.
    
    Related to T611

commit f1913512a5faa0c99d23607b9d63fc6003c729fb
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Dec 14 17:39:07 2021 +0100

    utils: Add a function to parse a subversion external definition
    
    Add a function to parse an external definition according to official
    specifications in order to extract or compute:
    
      - the relative path where the external should be exported
    
      - the URL of the external
    
      - the optional revision of the external to export
    
    Related to T611

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/254/ for more details.

Build is green

Patch application report for D6925 (id=25250)

Could not rebase; Attempt merge onto cb4bf60c0e...

Updating cb4bf60..a820d7e
Fast-forward
 swh/loader/svn/ra.py                | 260 ++++++++++++-
 swh/loader/svn/svn.py               |  52 ++-
 swh/loader/svn/tests/test_loader.py | 708 +++++++++++++++++++++++++++++++-----
 swh/loader/svn/tests/test_utils.py  | 261 ++++++++++++-
 swh/loader/svn/utils.py             | 128 ++++++-
 5 files changed, 1295 insertions(+), 114 deletions(-)
Changes applied before test
commit a820d7eab8d56c1b793b4144c6e2f16bf1d78ff1
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Jan 11 20:09:56 2022 +0100

    ra: Put externals in cache to avoid exporting them again
    
    Some subversion repositories can define same external on different paths.
    
    In order to avoid exporting it multiple times, which consumes network bandwith
    and slows down the loading, save the exported external in a temporary directory
    on the local filesystem and reuse that copy when the external is set on a path.
    
    Also ensure all the temporary directories created for externals will be deleted
    at the end of the loading process.
    
    Related to T611

commit 473fe145f4b7cd6cd1e1c0f9ec72cad04c38c4a8
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Dec 8 11:43:38 2021 +0100

    ra: Add support for subversion external definitions
    
    Subversion external definitions set on directories through the use of the
    svn:externals property are now handled by the loader.
    
    As with a svn export operation, externals will be attempted to be exported
    in the paths they are defined. If an external is no longer valid (404),
    the error will be ignored and the next one will be processed.
    
    The implementation takes care of keeping the reconstructed repository
    filesystem for a revision in sync with a svn export operation while
    externals are added, updated or removed across revisions replay.
    
    Related to T611

commit f1913512a5faa0c99d23607b9d63fc6003c729fb
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Dec 14 17:39:07 2021 +0100

    utils: Add a function to parse a subversion external definition
    
    Add a function to parse an external definition according to official
    specifications in order to extract or compute:
    
      - the relative path where the external should be exported
    
      - the URL of the external
    
      - the optional revision of the external to export
    
    Related to T611

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/258/ for more details.