Page MenuHomeSoftware Heritage

Consider making SWHID handling case insensitive
Open, NormalPublic

Description

Some of our (growing number of) users have raised an interesting issue: we should consider making SWHID handling case insensitive.

Here is an excerpt of a message from Mohammad Akhlaghi that explains the use case.

it would be good if the resolvers also interpret all-caps SWHIDs like:

SWH:1:CNT:66C1D53B2860A40AA9D350048F6B02C73C3B46C8

[...] the issue is that some LaTeX packages or web services automatically set everything to all caps, for example, the header on the top of the PDF pages of https://gitlab.com/makhlaghi/maneage-paper-pdf/-/raw/master/paper.pdf (that contain the DOI, arXiv or Zenodo links and uses '\markboth'). If you click on the arXiv link at the header of the PDF above, it won't work because while the 'arxiv.org' part is not case-sensitive, the 'abs/' part is. I have sent an email to the arXiv maintainers about this. But it works for the DOI links (I guess the 'doi.org' server is not-case-sensitive).

Should we consider changing the current specification of the SWHIDs that mandates lowercase letters everywhere in a core SWHID, or just simply lowercasing the core SWHIDs during resolutions?

Event Timeline

rdicosmo triaged this task as Normal priority.Apr 29 2021, 12:02 PM
rdicosmo created this task.
rdicosmo updated the task description. (Show Details)

Ah, this is an interesting practical problem.
I'm not a fan of changing the spec of SWHID version 1 to make them case insensitive, as it seems to be a significant change (in particular for the code that checks for the syntactic correctness of IDs).
But we can totally add a "SHOULD" section to the resolvers part of the spec recommending (but not mandating) that resolvers treat core SWHIDs as case insensitive. (Of course all the contextual parts cannot be considered case insensitive.)

This is going to be an interesting challenge/trade-off for SWHIDv2. Because I was considering there to use more compact encodings than hex, in order to shorten the SWHID length, like base58, but those are case-sensitive in order to be more dense.

So, as a counter argument above the "SHOULD" idea, we need to be careful about promoting a practice now that might change when switching from SWHIDv1 to SWHIDv2.

In T3298#64426, @zack wrote:

This is going to be an interesting challenge/trade-off for SWHIDv2. Because I was considering there to use more compact encodings than hex, in order to shorten the SWHID length, like base58, but those are case-sensitive in order to be more dense.

So, as a counter argument above the "SHOULD" idea, we need to be careful about promoting a practice now that might change when switching from SWHIDv1 to SWHIDv2.

Agreed, and nice to see this coming in just in time for the SWHIDv2 discussion :-)

I'm not a fan of changing the spec of SWHID version 1 to make them case insensitive, as it seems to be a significant change (in particular for the code that checks for the syntactic correctness of IDs).

So for SWHID v1, the resolver should turn the core part into lowercase , am I right ?

So for SWHID v1, the resolver should turn the core part into lowercase , am I right ?

For SWHIDv1, this seems the consensus indeed.