Page MenuHomeSoftware Heritage

SWHID: deal with escaping in origin qualifiers
ClosedPublic

Authored by zack on Apr 24 2020, 4:59 PM.

Details

Summary

The current spec is incorrect on this point, as (at least) ";" can appear in
origin URLs, and hence SWHID origin qualifiers, breaking SWHID parsing. This is
a minimal proposal that only requires to URL-escape the "%" and ";" characters,
as we do for paths.

... but to be honest the more I look into this the more it worries me. It would
be much better to fully percent-escape origin URLs, but that has the drawback
of making SWHID much less nicer to read.

I hope only escaping these two characters is enough, but I'm not entirely sure
it is...

Diff Detail

Repository
rDMOD Data model
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D3065 (id=10899)

Rebasing onto 3f388086cb...

Current branch diff-target is up to date.
Changes applied before test
commit 56cf99aeaf385a1928f7e16bf9fd85e7c6bfcf9e
Author: Stefano Zacchiroli <zack@upsilon.cc>
Date:   Fri Apr 24 16:56:47 2020 +0200

    SWHID: deal with escaping in origin qualifiers

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/47/ for more details.

For the record, there is currently no origin with a ';' character in its URL but better anticipate that case indeed.

softwareheritage=> select * from origin where url like '%;%';
 id | url 
----+-----
(0 ligne)

Looks good to me and I also think only percent encoding ;'; and '%' is better for SWHID readability.

This revision is now accepted and ready to land.Apr 24 2020, 5:13 PM