Page MenuHomeSoftware Heritage

dumb: Fix regression when joining URLs
ClosedPublic

Authored by anlambert on Jan 11 2022, 3:53 PM.

Details

Summary

urljoin does not produce the same output if the base URL does not
have a trailing slash.

>>> from urllib.parse import urljoin
>>> urljoin("https://git.example.org/repo", "info/refs")
'https://git.example.org/info/refs'
>>> urljoin("https://git.example.org/repo/", "info/refs")
'https://git.example.org/repo/info/refs'

So ensure the base URL ends with a slash to avoid generating invalid
URLs and make loading failed.

Test Plan

Tests have been updated to cover that case.

Diff Detail

Repository
rDLDG Git loader
Branch
dumb-fix-urljoin-regression
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 25937
Build 40535: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 40534: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D6917 (id=25062)

Rebasing onto db5b80b47c...

Current branch diff-target is up to date.
Changes applied before test
commit 64050aa08b531b29db429e26c468cd3504e7a032
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Jan 11 15:48:34 2022 +0100

    dumb: Fix regression when joining URLs
    
    urljoin does not produce the same output if the base URL does not
    have a trailing slash.
    
    >>> from urllib.parse import urljoin
    >>> urljoin("https://git.example.org/repo", "info/refs")
    'https://git.example.org/info/refs'
    >>> urljoin("https://git.example.org/repo/", "info/refs")
    'https://git.example.org/repo/info/refs'
    
    So ensure the base URL ends with a slash to avoid generating invalid
    URLs and make loading failed.

See https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/172/ for more details.

vlorentz added a subscriber: vlorentz.

oops, I misunderstood the original code

This revision is now accepted and ready to land.Jan 11 2022, 4:27 PM

oops, I misunderstood the original code

Yeah, urljoin is so misleading as it does not work the same way as os.path.join.

My bad too, I forgot about that when I did the review yesterday and only recalled afterwards that I encountered that type of issue
when working on the dumb loader.

This revision was automatically updated to reflect the committed changes.