Page MenuHomeSoftware Heritage

ra: Rework export of non link file with svn:special property set
ClosedPublic

Authored by anlambert on Dec 1 2021, 7:58 PM.

Details

Summary

The fix introduced in 58a07c67d590 to ensure reconstruction of non link
binary file with svn:special property set will be the same as an export
operation was not generic enough to handle all possible cases that can
be found in the wild.

So rework it by explicitely exporting that specific file with a subversion
client when it is needed.

Related to T3695

Depends on D6721

Diff Detail

Repository
rDLDSVN Subversion (SVN) loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D6722 (id=24413)

Could not rebase; Attempt merge onto d0b14d9d08...

Updating d0b14d9..50e87d1
Fast-forward
 swh/loader/svn/loader.py            |  5 +++
 swh/loader/svn/ra.py                | 59 ++++++++++++++++++++---------
 swh/loader/svn/svn.py               |  4 +-
 swh/loader/svn/tests/test_loader.py | 74 +++++++++++++++++++++++++++++++++----
 4 files changed, 116 insertions(+), 26 deletions(-)
Changes applied before test
commit 50e87d1d028da7ba447c277e09abf31042862914
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Dec 1 19:52:36 2021 +0100

    ra: Rework export of non link file with svn:special property set
    
    The fix introduced in 58a07c67d590 to ensure reconstruction of non link
    binary file with svn:special property set will be the same as an export
    operation was not generic enough to handle all possible cases that can
    be found in the wild.
    
    So rework it by explicitely exporting that specific file with a subversion
    client when it is needed.
    
    Related to T3695

commit 5cbfe6d03fe0c11c483072fd5836048eaa57f114
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Dec 1 18:50:33 2021 +0100

    loader: Clean replay directory before post_load operation
    
    The post_load operation will export the last loaded revision to a
    new temporary directory to check possible revision divergence.
    
    However the reconstructed filesystem for that revision still exists
    in another temporary directory after all revisions have been replayed.
    
    So ensure to clean that latter before post_load to gain some disk space
    and avoid possible "No space left on device" errors.

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/208/ for more details.

swh/loader/svn/ra.py
260

That assumption was wrong as I stumbled across cases where file is not truncated at the first null byte encountered.

ardumont added a subscriber: ardumont.

Nice catch.

also, the joy of coding the svn loader... also known as head... desk... or foot... gun... and all that...

Fortunately now, we got some data to load and sentry to report stuff like this, great!

swh/loader/svn/ra.py
26

I'm surprised there is no circular deps here ;)
That probably that if that prevents it.

272

carnage, the length we have to go to sometimes...
And i guess we can't just export the file, we must do a full export just to retrieve it, right?

This revision is now accepted and ready to land.Dec 2 2021, 10:04 AM
swh/loader/svn/ra.py
26

Yes that's why TYPE_CHECKING is used, the import will only be done when running mypy

272

Nope, only the file is exported not the whole filesystem associated to the revision.

That's one feature of subversion that git does not have: the export of a subpath.