Page MenuHomeSoftware Heritage

SVN loader: Normalize line endings when svn:eol-style property is set
ClosedPublic

Authored by anlambert on Sep 10 2018, 5:44 PM.

Details

Summary

That diff tries to resolve the loading of a Subversion repository where
wrong end of line style for a content has been stored in it.

For instance, it exists some corrupted svn dump files where a content
has CRLF line endings while the svn:eol-style is set to native.
According to the subversion documentation, this should not be possible
as the content should have been saved internally by Subversion with LF
line endings.

In a same manner, it also exists dump files storing a content with
mixed LF and CRLF line endings while having the svn:eol-style set to
native, which is also a violation of the Subversion specification.

Related T570

Test Plan

To do, apart generating a svn dump file containing the corner cases
described above I do not see any other way to test that fix.

Diff Detail

Repository
rDLDSVN Subversion (SVN) loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

To do, apart generating a svn dump file containing the corner cases described above I do not see any other way to test that fix.

Heh, not so easy indeed.
Possibly, taking a known dump (smallest one possible) with that error.
Committing it in the repository, draft the tests around it with that data?

swh/loader/svn/ra.py
231

That's interesting, that could be used for the svn symlink use case as well.

Today, the svn symlink disk representation is changed on disk (before/after patch appliance/hash computation, don't remember the right combination here but you grok the idea).
Because, in svn, the symlink is a regular file with a specific syntax.

Having a Content.from_svn_symlink (specific to that loader) would be preferable.
That would avoid those strange code instruction \m/

Nice.

I've added some comments as well.

swh/loader/svn/ra.py
22

This should probably be hardcoded to a "native to Software Heritage" line ending style, rather than depend on the platform on which the loader runs.

(I propose b'\n', obviously)

29

Does this function replicate what svn does when encountering mixed line endings on the file sent by the server (e.g. a file with CRLFs with a lone LF)? Does this ever happen?

swh/loader/svn/ra.py
22

Of course, will be changed in upcoming updated difff

29

Yes, the function will guarantee that all line endings will be the same after processing the buffer of bytes.
Subversion also repair files with mixed line endings on export and checkout when the svn:eol-style property is set.
Such cases exist, for instance some svn repositories from googlecode store versioned filed with mixed line endings
(a sample svn dump with a revision where the problem appears will be provided as test data).

Awesome!
I'll take a look after i'm done with the pypi loader ;)

Cheers,

This revision is now accepted and ready to land.Sep 19 2018, 3:38 PM
This revision was automatically updated to reflect the committed changes.