The (in)famous two different files with same length and same SHA1 (SHAttered) is being included as a test in cryptography related projects. An example showed up as a result of a failure to load the https://gitlab.com/sequoia-pgp/sequoia repository, that contains such files.
$ git clone https://gitlab.com/sequoia-pgp/sequoia [...] $ cd sequoia/openpgp/tests/data/messages $ sha1sum shattered-.pdf 38762cf7f55934b34d179ae6a4c80cadccbb7f0a shattered-1.pdf 38762cf7f55934b34d179ae6a4c80cadccbb7f0a shattered-2.pdf
It turns out that this does not pose a problem for git, nor for our SWHIDv1, as the SHA1 conflicting files do not produce a SHA1-git conflict: indeed, these files are properly stored in the sequoia project.
$ git hash-object shattered-.pdf ba9aaa145ccd24ef760cf31c74d8f7ca1a2e47b0 b621eeccd5c7edac9b7dcba35a8d5afd075e24f2
But our current pipeline detects the SHA1 conflict and prevents their ingestion.
We need to design a way to archive such repositories, instead of skipping like we do today.