Page MenuHomeSoftware Heritage

identifiers: Fix parsing of SWHID qualifier value containing '='
ClosedPublic

Authored by anlambert on Apr 12 2021, 5:48 PM.

Details

Summary

According to the SWHID specification, it is not forbidden for a qualifier
value to contain a '=' character (for instance in origin URL).

So update parsing code to handle that special case.

Diff Detail

Repository
rDMOD Data model
Branch
fix-swhid-qualifier-value-parsing
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 20652
Build 32048: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 32047: arc lint + arc unit

Event Timeline

swh/model/tests/test_identifiers.py
1572–1577

I have a doubt whether the path percent encoding is expected or not here.

Build is green

Patch application report for D5487 (id=19621)

Rebasing onto f2dba177ad...

First, rewinding head to replay your work on top of it...
Applying: identifiers: Fix parsing of SWHID qualifier value containing '='
Changes applied before test
commit 7ec3890a8466d1f25d89a8f575a9f76e19e2eb3c
Author: Antoine Lambert <antoine.lambert@inria.fr>
Date:   Mon Apr 12 17:44:20 2021 +0200

    identifiers: Fix parsing of SWHID qualifier value containing '='
    
    According to the SWHID specification, it is not forbidden for a qualifier
    value to contain a '=' character (for instance in origin URL).
    
    So update parsing code to handle that special case.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/311/ for more details.

vlorentz added a subscriber: vlorentz.
vlorentz added inline comments.
swh/model/identifiers.py
1233–1234
swh/model/tests/test_identifiers.py
1572–1577

not expected, but accepted

This revision now requires changes to proceed.Apr 12 2021, 6:33 PM
swh/model/identifiers.py
1233–1234

TIL the maxsplit option for split, thanks !

swh/model/tests/test_identifiers.py
1572–1577

Paths containing = character are indeed percent encoded in the SWHID copiable from the webapp: https://archive.softwareheritage.org/browse/directory/3ded8cb4d4770f9451b2f27e1576a8508737d8a7/?path=r%3D6

I guess we should leave it as it is.

Build is green

Patch application report for D5487 (id=19622)

Rebasing onto f2dba177ad...

First, rewinding head to replay your work on top of it...
Applying: identifiers: Fix parsing of SWHID qualifier value containing '='
Changes applied before test
commit cc5e73ab507bc7d287304787dcf3eef8c9317fad
Author: Antoine Lambert <antoine.lambert@inria.fr>
Date:   Mon Apr 12 17:44:20 2021 +0200

    identifiers: Fix parsing of SWHID qualifier value containing '='
    
    According to the SWHID specification, it is not forbidden for a qualifier
    value to contain a '=' character (for instance in origin URL).
    
    So update parsing code to handle that special case.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/312/ for more details.

Build is green

Patch application report for D5487 (id=19623)

Rebasing onto f2dba177ad...

First, rewinding head to replay your work on top of it...
Applying: identifiers: Fix parsing of SWHID qualifier value containing '='
Changes applied before test
commit 2b824e68b3605df89d5e6fff63470c813c1b9ba7
Author: Antoine Lambert <antoine.lambert@inria.fr>
Date:   Mon Apr 12 17:44:20 2021 +0200

    identifiers: Fix parsing of SWHID qualifier value containing '='
    
    According to the SWHID specification, it is not forbidden for a qualifier
    value to contain a '=' character (for instance in origin URL).
    
    So update parsing code to handle that special case.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/313/ for more details.

In the webapp, origin URL with a = character in it also got percent encoded to avoid generating invalid browse URL: https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://cran.r-project.org/package%3Dtelegram.bot.

I am wondering if we should not also force to percent encode = in qualifier value as we do for ;;

Too late for that; we can't know if there are SWHIDs in the wild with unescaped equal signs in them.

Too late for that; we can't know if there are SWHIDs in the wild with unescaped equal signs in them.

Ack, good news is that diff will fix SWHID resolution in the webapp when the = character in an origin URL is not percent encoded (currently it fails as expected).

Build is green

Patch application report for D5487 (id=19638)

Rebasing onto 15d5bab512...

Current branch diff-target is up to date.
Changes applied before test
commit 74b024f5e0b862e8cff12c62406d06b22d258c84
Author: Antoine Lambert <antoine.lambert@inria.fr>
Date:   Mon Apr 12 17:44:20 2021 +0200

    identifiers: Fix parsing of SWHID qualifier value containing '='
    
    According to the SWHID specification, it is not forbidden for a qualifier
    value to contain a '=' character (for instance in origin URL).
    
    So update parsing code to handle that special case.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/315/ for more details.

This revision is now accepted and ready to land.Apr 13 2021, 3:12 PM