https://www.github.com/kalpitk/test and https://www.github.com/kalpitk/test/ get archived separately, despite actually being same.
-> https://archive.softwareheritage.org/browse/origin/https://github.com/kalpitk/test/directory/
-> https://archive.softwareheritage.org/browse/origin/https://github.com/kalpitk/test//directory/
Description
Description
Related Objects
Related Objects
- Mentioned Here
- T1110: document GitHub caseness caveats
Event Timeline
Comment Actions
This is intended, because there is no guarantee that the Git repository accessible via an URL with a trailing slash will be the same of the one accessible at the same URL without the trailing slash. Same argument goes for all other examples you mention.
It is indeed the case that GitHub considers all those URLs as equivalent, but we are not a GitHub archival project, we are much more generic than GitHub.
What should be done is hence to document this GitHub-specific behavior, which is already tracked in T1110.
Another thing we can do is adding GitHub-specific normalization, but that would be a very slipper slopes: provider-specific normalization heuristics will be hard to maintain.