Page MenuHomeSoftware Heritage

ra: Fix export of non link file with svn:special property set
ClosedPublic

Authored by ardumont on Nov 3 2021, 5:32 PM.

Details

Summary

When a file with the svn:special property set is not a svn link,
the svn export operation will extract a truncated version of that
file if it contains a null byte (see create_special_file_from_stream
implementation in libsvn_subr/subst.c).

So ensure to produce the same file as the svn export operation
otherwise the reconstructed file system will differ.

Related to T3695

Diff Detail

Repository
rDLDSVN Subversion (SVN) loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D6601 (id=23988)

Rebasing onto b326099c73...

Current branch diff-target is up to date.
Changes applied before test
commit 19d13db456aa957a6d52b1e0ee067851702dda07
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Nov 3 15:42:37 2021 +0100

    ra: Fix export of non link file with svn:special property set
    
    When a file with the svn:special property set is not a svn link,
    the svn export operation will extract a truncated version of that
    file if it contains a null byte (see create_special_file_from_stream
    implementation in libsvn_subr/subst.c).
    
    So ensure to produce the same file as the svn export operation
    otherwise the reconstructed file system will differ.
    
    Related to T3695

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/153/ for more details.

Update: Improve commit messages in test

Build is green

Patch application report for D6601 (id=23989)

Rebasing onto b326099c73...

Current branch diff-target is up to date.
Changes applied before test
commit 760b72edd3d71b426af9e29d6ba1d42e1ac228c8
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Nov 3 15:42:37 2021 +0100

    ra: Fix export of non link file with svn:special property set
    
    When a file with the svn:special property set is not a svn link,
    the svn export operation will extract a truncated version of that
    file if it contains a null byte (see create_special_file_from_stream
    implementation in libsvn_subr/subst.c).
    
    So ensure to produce the same file as the svn export operation
    otherwise the reconstructed file system will differ.
    
    Related to T3695

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/154/ for more details.

will extract a truncated version of that file if it contains a null byte

​* sigh *

swh/loader/svn/tests/test_loader.py
1095–1097

could you write a short docstring explaining why null bytes are an issue?

This revision is now accepted and ready to land.Nov 4 2021, 10:28 AM
vlorentz added inline comments.
swh/loader/svn/ra.py
128–129

hmm, wait. Why is this a global variable?

This revision now requires changes to proceed.Nov 4 2021, 10:29 AM
swh/loader/svn/ra.py
128–129

Because FileEditor objects are created when processing each revision so we cannot store those info in the class.

swh/loader/svn/tests/test_loader.py
1095–1097

sure

swh/loader/svn/ra.py
128–129

Thinking it back, I could use a class variable.

Build is green

Patch application report for D6601 (id=23994)

Rebasing onto b326099c73...

Current branch diff-target is up to date.
Changes applied before test
commit 391b15bab527025d468b9b706b67fa0d39cf48e6
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Nov 3 15:42:37 2021 +0100

    ra: Fix export of non link file with svn:special property set
    
    When a file with the svn:special property set is not a svn link,
    the svn export operation will extract a truncated version of that
    file if it contains a null byte (see create_special_file_from_stream
    implementation in libsvn_subr/subst.c).
    
    So ensure to produce the same file as the svn export operation
    otherwise the reconstructed file system will differ.
    
    Related to T3695

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/157/ for more details.

vlorentz added inline comments.
swh/loader/svn/ra.py
128–129

that's almost like a global variable...

I'll see what I can do

This revision is now accepted and ready to land.Nov 8 2021, 2:02 PM

I'm getting this when running the test on my machine, any idea why?

_______________________________________________________________ test_loader_svn_special_property_on_binary_file_with_null_byte _______________________________________________________________

swh_storage = <swh.storage.proxies.filter.FilteringProxyStorage object at 0x7f56aff864c0>, tmp_path = PosixPath('/tmp/pytest-of-dev/pytest-172/test_loader_svn_special_proper0')

    def test_loader_svn_special_property_on_binary_file_with_null_byte(
        swh_storage, tmp_path
    ):
        """When a file has the svn:special property set but is not a svn link,
        it will be truncated when performing an export operation if it contains
        a null byte. Indeed, subversion will treat the file content as text but
        it might be a binary file containing null bytes."""
    
        # create a repository
        repo_path = os.path.join(tmp_path, "tmprepo")
        repos.create(repo_path)
        repo_url = f"file://{repo_path}"
    
        data = (
            b"!<symlink>\xff\xfea\x00p\x00t\x00-\x00c\x00y\x00g\x00.\x00s\x00h\x00\x00\x00"
        )
    
        # first commit
        add_commit(
            repo_url,
            "Add a non svn link binary file and set the svn:special property on it",
            [
                CommitChange(
                    change_type=CommitChangeType.AddOrUpdate,
                    path="binary_file",
                    properties={"svn:special": "*"},
                    data=data,
                ),
            ],
        )
    
        # second commit
        add_commit(
            repo_url,
            "Remove the svn:special property on the previously added file",
            [
                CommitChange(
                    change_type=CommitChangeType.AddOrUpdate,
                    path="binary_file",
                    properties={"svn:special": None},
                ),
            ],
        )
    
        # instantiate a svn loader checking after each processed revision that
        # the repository filesystem it reconstructed does not differ from a subversion
        # export of that revision
        loader = SvnLoader(
            swh_storage, repo_url, destination_path=tmp_path, check_revision=1
        )
    
        assert loader.load() == {"status": "eventful"}
>       assert loader.visit_status() == "full"
E       AssertionError: assert 'partial' == 'full'
E         - full
E         + partial

.tox/py3/lib/python3.9/site-packages/swh/loader/svn/tests/test_loader.py:1133: AssertionError
------------------------------------------------------------------------------------- Captured log call --------------------------------------------------------------------------------------
ERROR    swh.loader.svn.SvnLoader:loader.py:466 Hash tree computation divergence detected (6cfb52777db664b3c9b1258bab2f35f7d159acea != 5d221b973aa9e2d31f600cc6119e9a01d23ba605), stopping!
Traceback (most recent call last):
  File "/home/dev/swh-environment/swh-loader-svn/.tox/py3/lib/python3.9/site-packages/swh/loader/svn/loader.py", line 459, in fetch_data
    data = next(self.swh_revision_gen)
  File "/home/dev/swh-environment/swh-loader-svn/.tox/py3/lib/python3.9/site-packages/swh/loader/svn/loader.py", line 381, in process_svn_revisions
    self._check_revision_divergence(count, rev, dir_id)
  File "/home/dev/swh-environment/swh-loader-svn/.tox/py3/lib/python3.9/site-packages/swh/loader/svn/loader.py", line 334, in _check_revision_divergence
    raise ValueError(err)
ValueError: Hash tree computation divergence detected (6cfb52777db664b3c9b1258bab2f35f7d159acea != 5d221b973aa9e2d31f600cc6119e9a01d23ba605), stopping!

I'm getting this when running the test on my machine, any idea why?
...

No idea.
As another datapoint, everything is fine on my end (before and after an mr update):

391b15b  8⚑  %  pytest -x -s swh/loader/svn/tests/test_loader.py 2>&1 | tail -1 | tr -d '='
 26 passed, 67 warnings in 7.08s
391b15b  8⚑  %  python --version
Python 3.9.2
391b15b  8⚑  %  svn --version | head -2
svn, version 1.14.1 (r1886195)
   compiled Feb 18 2021, 02:52:03 on x86_64-pc-linux-gnu

nevermind, I had an uncommitted change somewhere

ardumont added a reviewer: anlambert.

Build is green

Patch application report for D6601 (id=24043)

Rebasing onto 51f4d54bdf...

Current branch diff-target is up to date.
Changes applied before test
commit 58a07c67d5901b9c0c9fed428b1b3308ad4df291
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Nov 3 15:42:37 2021 +0100

    ra: Fix export of non link file with svn:special property set
    
    When a file with the svn:special property set is not a svn link,
    the svn export operation will extract a truncated version of that
    file if it contains a null byte (see create_special_file_from_stream
    implementation in libsvn_subr/subst.c).
    
    So ensure to produce the same file as the svn export operation
    otherwise the reconstructed file system will differ.
    
    Related to T3695

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/164/ for more details.