Page MenuHomeSoftware Heritage

ra: Fix export of non link file with svn:special property set
ClosedPublic

Authored by ardumont on Nov 3 2021, 5:32 PM.

Details

Summary

When a file with the svn:special property set is not a svn link,
the svn export operation will extract a truncated version of that
file if it contains a null byte (see create_special_file_from_stream
implementation in libsvn_subr/subst.c).

So ensure to produce the same file as the svn export operation
otherwise the reconstructed file system will differ.

Related to T3695

Diff Detail

Repository
rDLDSVN Subversion (SVN) loader
Branch
fix-svn-special-path-export-for-non-link-file
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 24863
Build 38833: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 38832: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D6601 (id=23988)

Rebasing onto b326099c73...

Current branch diff-target is up to date.
Changes applied before test
commit 19d13db456aa957a6d52b1e0ee067851702dda07
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Nov 3 15:42:37 2021 +0100

    ra: Fix export of non link file with svn:special property set
    
    When a file with the svn:special property set is not a svn link,
    the svn export operation will extract a truncated version of that
    file if it contains a null byte (see create_special_file_from_stream
    implementation in libsvn_subr/subst.c).
    
    So ensure to produce the same file as the svn export operation
    otherwise the reconstructed file system will differ.
    
    Related to T3695

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/153/ for more details.

Update: Improve commit messages in test

Build is green

Patch application report for D6601 (id=23989)

Rebasing onto b326099c73...

Current branch diff-target is up to date.
Changes applied before test
commit 760b72edd3d71b426af9e29d6ba1d42e1ac228c8
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Nov 3 15:42:37 2021 +0100

    ra: Fix export of non link file with svn:special property set
    
    When a file with the svn:special property set is not a svn link,
    the svn export operation will extract a truncated version of that
    file if it contains a null byte (see create_special_file_from_stream
    implementation in libsvn_subr/subst.c).
    
    So ensure to produce the same file as the svn export operation
    otherwise the reconstructed file system will differ.
    
    Related to T3695

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/154/ for more details.

will extract a truncated version of that file if it contains a null byte

​* sigh *

swh/loader/svn/tests/test_loader.py
1081–1083

could you write a short docstring explaining why null bytes are an issue?

This revision is now accepted and ready to land.Nov 4 2021, 10:28 AM
vlorentz added inline comments.
swh/loader/svn/ra.py
128–129

hmm, wait. Why is this a global variable?

This revision now requires changes to proceed.Nov 4 2021, 10:29 AM
swh/loader/svn/ra.py
128–129

Because FileEditor objects are created when processing each revision so we cannot store those info in the class.

swh/loader/svn/tests/test_loader.py
1081–1083

sure

swh/loader/svn/ra.py
128–129

Thinking it back, I could use a class variable.

Build is green

Patch application report for D6601 (id=23994)

Rebasing onto b326099c73...

Current branch diff-target is up to date.
Changes applied before test
commit 391b15bab527025d468b9b706b67fa0d39cf48e6
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Nov 3 15:42:37 2021 +0100

    ra: Fix export of non link file with svn:special property set
    
    When a file with the svn:special property set is not a svn link,
    the svn export operation will extract a truncated version of that
    file if it contains a null byte (see create_special_file_from_stream
    implementation in libsvn_subr/subst.c).
    
    So ensure to produce the same file as the svn export operation
    otherwise the reconstructed file system will differ.
    
    Related to T3695

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/157/ for more details.

vlorentz added inline comments.
swh/loader/svn/ra.py
128–129

that's almost like a global variable...

I'll see what I can do

This revision is now accepted and ready to land.Nov 8 2021, 2:02 PM

I'm getting this when running the test on my machine, any idea why?

_______________________________________________________________ test_loader_svn_special_property_on_binary_file_with_null_byte _______________________________________________________________

swh_storage = <swh.storage.proxies.filter.FilteringProxyStorage object at 0x7f56aff864c0>, tmp_path = PosixPath('/tmp/pytest-of-dev/pytest-172/test_loader_svn_special_proper0')

    def test_loader_svn_special_property_on_binary_file_with_null_byte(
        swh_storage, tmp_path
    ):
        """When a file has the svn:special property set but is not a svn link,
        it will be truncated when performing an export operation if it contains
        a null byte. Indeed, subversion will treat the file content as text but
        it might be a binary file containing null bytes."""
    
        # create a repository
        repo_path = os.path.join(tmp_path, "tmprepo")
        repos.create(repo_path)
        repo_url = f"file://{repo_path}"
    
        data = (
            b"!<symlink>\xff\xfea\x00p\x00t\x00-\x00c\x00y\x00g\x00.\x00s\x00h\x00\x00\x00"
        )
    
        # first commit
        add_commit(
            repo_url,
            "Add a non svn link binary file and set the svn:special property on it",
            [
                CommitChange(
                    change_type=CommitChangeType.AddOrUpdate,
                    path="binary_file",
                    properties={"svn:special": "*"},
                    data=data,
                ),
            ],
        )
    
        # second commit
        add_commit(
            repo_url,
            "Remove the svn:special property on the previously added file",
            [
                CommitChange(
                    change_type=CommitChangeType.AddOrUpdate,
                    path="binary_file",
                    properties={"svn:special": None},
                ),
            ],
        )
    
        # instantiate a svn loader checking after each processed revision that
        # the repository filesystem it reconstructed does not differ from a subversion
        # export of that revision
        loader = SvnLoader(
            swh_storage, repo_url, destination_path=tmp_path, check_revision=1
        )
    
        assert loader.load() == {"status": "eventful"}
>       assert loader.visit_status() == "full"
E       AssertionError: assert 'partial' == 'full'
E         - full
E         + partial

.tox/py3/lib/python3.9/site-packages/swh/loader/svn/tests/test_loader.py:1133: AssertionError
------------------------------------------------------------------------------------- Captured log call --------------------------------------------------------------------------------------
ERROR    swh.loader.svn.SvnLoader:loader.py:466 Hash tree computation divergence detected (6cfb52777db664b3c9b1258bab2f35f7d159acea != 5d221b973aa9e2d31f600cc6119e9a01d23ba605), stopping!
Traceback (most recent call last):
  File "/home/dev/swh-environment/swh-loader-svn/.tox/py3/lib/python3.9/site-packages/swh/loader/svn/loader.py", line 459, in fetch_data
    data = next(self.swh_revision_gen)
  File "/home/dev/swh-environment/swh-loader-svn/.tox/py3/lib/python3.9/site-packages/swh/loader/svn/loader.py", line 381, in process_svn_revisions
    self._check_revision_divergence(count, rev, dir_id)
  File "/home/dev/swh-environment/swh-loader-svn/.tox/py3/lib/python3.9/site-packages/swh/loader/svn/loader.py", line 334, in _check_revision_divergence
    raise ValueError(err)
ValueError: Hash tree computation divergence detected (6cfb52777db664b3c9b1258bab2f35f7d159acea != 5d221b973aa9e2d31f600cc6119e9a01d23ba605), stopping!

I'm getting this when running the test on my machine, any idea why?
...

No idea.
As another datapoint, everything is fine on my end (before and after an mr update):

391b15b  8⚑  %  pytest -x -s swh/loader/svn/tests/test_loader.py 2>&1 | tail -1 | tr -d '='
 26 passed, 67 warnings in 7.08s
391b15b  8⚑  %  python --version
Python 3.9.2
391b15b  8⚑  %  svn --version | head -2
svn, version 1.14.1 (r1886195)
   compiled Feb 18 2021, 02:52:03 on x86_64-pc-linux-gnu

nevermind, I had an uncommitted change somewhere

ardumont added a reviewer: anlambert.

Build is green

Patch application report for D6601 (id=24043)

Rebasing onto 51f4d54bdf...

Current branch diff-target is up to date.
Changes applied before test
commit 58a07c67d5901b9c0c9fed428b1b3308ad4df291
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Nov 3 15:42:37 2021 +0100

    ra: Fix export of non link file with svn:special property set
    
    When a file with the svn:special property set is not a svn link,
    the svn export operation will extract a truncated version of that
    file if it contains a null byte (see create_special_file_from_stream
    implementation in libsvn_subr/subst.c).
    
    So ensure to produce the same file as the svn export operation
    otherwise the reconstructed file system will differ.
    
    Related to T3695

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/164/ for more details.