Page MenuHomeSoftware Heritage

ra: Fix parsing of svn link with multiple lines
ClosedPublic

Authored by anlambert on Nov 29 2021, 1:27 PM.

Details

Summary

Some svn links might contain multiple lines if a user added an
end of line character in the first line.

When exporting a link, subversion will only consider the first
line content to create a symbolic link so we need to apply the
same processing or the link we create will differ from the
exported one.

Related to T3695

Diff Detail

Repository
rDLDSVN Subversion (SVN) loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D6704 (id=24354)

Rebasing onto c2c27f1de8...

Current branch diff-target is up to date.
Changes applied before test
commit d0b14d9d08c895a60048bc1518d6a92d368839dc
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Nov 29 13:22:10 2021 +0100

    ra: Fix parsing of svn link with multiple lines
    
    Some svn links might contain multiple lines if a user added an
    end of line character in the first line.
    
    When exporting a link, subversion will only consider the first
    line content to create a symbolic link so we need to apply the
    same processing or the link we create will differ from the
    exported one.
    
    Related to T3695

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/206/ for more details.

ardumont added a subscriber: ardumont.
ardumont added inline comments.
swh/loader/svn/tests/test_loader.py
1766

so the content of the link becomes "hello-world\r" [1], right?

I guess bad input, bad output ;)

[1]

ipython
Python 3.9.2 (default, Feb 28 2021, 17:03:44)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.27.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: data = b"link hello-world\r\n"

In [2]: data
Out[2]: b'link hello-world\r\n'

In [3]: first_line = data.split(b"\n")[0]

In [4]: first_line
Out[4]: b'link hello-world\r'

In [6]: split_byte=b" "

In [7]: filetype, *src = first_line.split(split_byte)

In [8]: filetype
Out[8]: b'link'

In [9]: src
Out[9]: [b'hello-world\r']

Your test says it's consistent with how svn is doing it so fine.

This revision is now accepted and ready to land.Nov 29 2021, 2:23 PM
swh/loader/svn/tests/test_loader.py
1766

Yes, the link will be broken in Unix based systems but not on windows I guess, see below the AUTHORS link after an svn export:

(swh) anlambert@carnavalet:/tmp/cartoreso-sf-64/trunk$ ls -l
total 920
-rwxr-xr-x 1 anlambert anlambert 732490 avril  9  2007 ant.jar
lrwxrwxrwx 1 anlambert anlambert     10 nov.  29 12:00 AUTHORS -> 'COPYRIGHT'$'\r'
-rwxr-xr-x 1 anlambert anlambert    790 avril  9  2007 build.bat
-rwxr-xr-x 1 anlambert anlambert   1597 juin  24  2007 build-binaries.sh
-rw-r--r-- 1 anlambert anlambert    434 avril  9  2007 build.properties
-rwxr-xr-x 1 anlambert anlambert    836 juin  24  2007 build.sh
-rw-r--r-- 1 anlambert anlambert   4400 mai   31  2007 build.xml
-rwxr-xr-x 1 anlambert anlambert    582 janv.  9  2008 cartoreso.bat
-rwxr-xr-x 1 anlambert anlambert   1345 nov.   6  2007 cartoreso.properties.sample
-rwxr-xr-x 1 anlambert anlambert    988 janv.  9  2008 cartoreso-server.sh
-rwxr-xr-x 1 anlambert anlambert    977 juin  24  2007 cartoreso.sh
-rw-r--r-- 1 anlambert anlambert    485 janv.  9  2008 COPYRIGHT
drwxr-xr-x 9 anlambert anlambert   4096 nov.  29 12:00 doc
drwxr-xr-x 2 anlambert anlambert   4096 nov.  29 12:00 images
-rw-r--r-- 1 anlambert anlambert     68 janv.  9  2008 INSTALL
drwxr-xr-x 2 anlambert anlambert   4096 nov.  29 12:00 jpcap.jni
-rw-r--r-- 1 anlambert anlambert   1467 janv.  9  2008 LICENSE
-rw-r--r-- 1 anlambert anlambert  18679 nov.   6  2007 LICENSE.GPLv2
drwxr-xr-x 2 anlambert anlambert   4096 nov.  29 12:00 nmap
drwxr-xr-x 2 anlambert anlambert   4096 nov.  29 12:00 patch
-rw-r--r-- 1 anlambert anlambert 105884 avril  9  2007 services.xml
drwxr-xr-x 4 anlambert anlambert   4096 nov.  29 12:00 src
drwxr-xr-x 2 anlambert anlambert   4096 nov.  29 12:00 www