Page MenuHomeSoftware Heritage

svn: Use urllib.parse.quote to percent encode svn URLs
ClosedPublic

Authored by anlambert on Dec 7 2022, 5:39 PM.

Details

Summary

In order to detect all ascii characters that must be percent encoded
in svn URLs, add a brute force test and use urllib.parse.quote in
quote_svn_url function.

Depends on D8942

Diff Detail

Repository
rDLDSVN Subversion (SVN) loader
Branch
better-svn-url-quoting
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 33175
Build 52008: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 52007: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D8943 (id=32220)

Could not rebase; Attempt merge onto 2b80484b8b...

Updating 2b80484..fc78f57
Fast-forward
 swh/loader/svn/loader.py               |   2 +-
 swh/loader/svn/replay.py               | 284 +++------------------------------
 swh/loader/svn/svn.py                  |  22 +--
 swh/loader/svn/tests/test_externals.py | 158 ++++++++++++++++++
 swh/loader/svn/tests/test_loader.py    |  87 +++++++++-
 swh/loader/svn/tests/test_utils.py     |  13 ++
 swh/loader/svn/utils.py                |   5 +
 7 files changed, 290 insertions(+), 281 deletions(-)
Changes applied before test
commit fc78f57464df7fbcc7020115fd928712c413a156
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Dec 7 11:28:46 2022 +0100

    svn: Use urllib.parse.quote to percent encode svn URLs
    
    In order to detect all ascii characters that must be percent encoded
    in svn URLs, add a brute force test and use urllib.parse.quote in
    quote_svn_url function.

commit f377e9f7cef4e7cc672a5d8de4ec1da99de9080f
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Dec 6 16:37:12 2022 +0100

    utils: Raise ValueError when external definition could not be parsed
    
    Such case can happen when an external definition is malformed.
    
    Previously, the parsed malformed external was added to the directory state
    with an empty external URL which could lead to unexpected side effects like
    removing all previously exported valid externals.

commit 301b31e950998df20ce522782e31b649ba6a652f
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Nov 23 17:28:16 2022 +0100

    replay: Simplify FileEditor implementation
    
    Instead of maintaining file state based on svn properties across revisions
    replay and trying to reconstruct the same file as with a svn export operation
    after applying text deltas, prefer to simply export the file from the currently
    processed revision when closing the associated file editor.
    
    This greatly simplify the replay module implementation while approximatively
    keeping the same performance as before.
    
    Also add a test that would fail without these changes.
    
    Related to T4673

commit 9f3a5800d208f060949b8fb0f3e9788ab275a384
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Thu Nov 24 11:54:03 2022 +0100

    replay: Do not ignore externals in copyfrom operations
    
    When copying a directory from an ancestor revision, do not ignore
    externals as properties are also copied by subversion so external
    paths must also be exported.

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/364/ for more details.

vlorentz requested changes to this revision.Dec 8 2022, 2:27 AM
vlorentz added a subscriber: vlorentz.
vlorentz added inline comments.
swh/loader/svn/tests/test_loader.py
2182–2186

that gives you a superset, which also contains whitespace:

>>> import string
>>> {chr(i) for i in range(32, 127)} - set(string.printable)
set()
>>> set(string.printable) - {chr(i) for i in range(32, 127)}
{'\t', '\x0b', '\x0c', '\r', '\n'}
This revision now requires changes to proceed.Dec 8 2022, 2:27 AM
swh/loader/svn/tests/test_loader.py
2182–2186

Those characters cannot be in a subversion filename (I got this type of error message Invalid control character '0x0b' in path) so that's why I used a subset here.

This revision was not accepted when it landed; it landed in state Needs Revision.Dec 8 2022, 5:20 PM
This revision was automatically updated to reflect the committed changes.