Page MenuHomeSoftware Heritage

dumb: Fix regression when joining URLs
ClosedPublic

Authored by anlambert on Jan 11 2022, 3:53 PM.

Details

Summary

urljoin does not produce the same output if the base URL does not
have a trailing slash.

>>> from urllib.parse import urljoin
>>> urljoin("https://git.example.org/repo", "info/refs")
'https://git.example.org/info/refs'
>>> urljoin("https://git.example.org/repo/", "info/refs")
'https://git.example.org/repo/info/refs'

So ensure the base URL ends with a slash to avoid generating invalid
URLs and make loading failed.

Test Plan

Tests have been updated to cover that case.

Diff Detail

Repository
rDLDG Git loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D6917 (id=25062)

Rebasing onto db5b80b47c...

Current branch diff-target is up to date.
Changes applied before test
commit 64050aa08b531b29db429e26c468cd3504e7a032
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Jan 11 15:48:34 2022 +0100

    dumb: Fix regression when joining URLs
    
    urljoin does not produce the same output if the base URL does not
    have a trailing slash.
    
    >>> from urllib.parse import urljoin
    >>> urljoin("https://git.example.org/repo", "info/refs")
    'https://git.example.org/info/refs'
    >>> urljoin("https://git.example.org/repo/", "info/refs")
    'https://git.example.org/repo/info/refs'
    
    So ensure the base URL ends with a slash to avoid generating invalid
    URLs and make loading failed.

See https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/172/ for more details.

vlorentz added a subscriber: vlorentz.

oops, I misunderstood the original code

This revision is now accepted and ready to land.Jan 11 2022, 4:27 PM

oops, I misunderstood the original code

Yeah, urljoin is so misleading as it does not work the same way as os.path.join.

My bad too, I forgot about that when I did the review yesterday and only recalled afterwards that I encountered that type of issue
when working on the dumb loader.

This revision was automatically updated to reflect the committed changes.