Page MenuHomeSoftware Heritage

Remove parse_url helper that adds no real value
ClosedPublic

Authored by tenma on Oct 2 2020, 3:09 PM.

Diff Detail

Event Timeline

Build has FAILED

Patch application report for D4131 (id=14559)

Could not rebase; Attempt merge onto ad23ee03c0...

Updating ad23ee0..8d96ac3
Fast-forward
 swh/scanner/cli.py | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)
Changes applied before test
commit 8d96ac3bed31eb88bf575ca7b8ca8fdb262d0d90
Author: tenma <tenma+swh@mailbox.org>
Date:   Thu Oct 1 21:35:27 2020 +0200

    Remove parse_url helper that adds no real value

commit 7d7846954e35a969bba2d85f234d272e1fdabfaf
Author: tenma <tenma+swh@mailbox.org>
Date:   Tue Sep 29 19:47:54 2020 +0200

    Improve cli documentation
    
    The docstring was moved out of scanner docstring spot to work around a
    bug in our current Python version. See https://bugs.python.org/issue28739

Link to build: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/56/
See console output for more information: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/56/console

this is debatable, but it does "normalize" the given url, so it does something. I agress the https:// auto-add prefix is strange, but the trailing / still brings value. For example there are listers that do not implement this, so if you create a listing task with url=https://somehere.org/api/v1 it will fail because it will forge invalid urls (missing the trailing /).
[edit] and I find this very annoying

this is debatable, but it does "normalize" the given url, so it does something. I agress the https:// auto-add prefix is strange, but the trailing / still brings value. For example there are listers that do not implement this, so if you create a listing task with url=https://somehere.org/api/v1 it will fail because it will forge invalid urls (missing the trailing /).
[edit] and I find this very annoying

So the question is whether the code of the scanner properly handle an URL without a trailing /, if not then this is still valuable.

OTOH it is CLI so interactive use so the error is instant.
The function also had a bad name.
I can just keep adding the trailing slash if you prefer, inline (no specific function).

I concur with @tenma here, that function made me raise eyebrows in the past.
I agree that making sure we are not / sensitive is a good usability feature, but that could be a oneliner somewhere that doesn't need a function.

I would add that I have even been bitten by it, providing a "http://" url that became "https://"http://""... :)

I will just inline the trailing slash fix then.

Inlined the intersting fix

Build is green

Patch application report for D4131 (id=14574)

Rebasing onto ad23ee03c0...

Current branch diff-target is up to date.
Changes applied before test
commit 15875836f0961753d1647d42d0658ce7d465a48e
Author: tenma <tenma+swh@mailbox.org>
Date:   Thu Oct 1 21:35:27 2020 +0200

    Remove parse_url helper and inline trailing slash fix

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/58/ for more details.

This revision is now accepted and ready to land.Oct 2 2020, 10:36 PM