stuff related to content (of all kinds, not only "blobs") that is already stored in the Software Heritage archive
Fri, Nov 26
Copy of an email I sent on 2021-11-17:
Mon, Nov 22
Wed, Nov 10
At least loader deposit and npm  are fine.
Tue, Nov 9
Mon, Nov 8
Here is an overview of the fields (+ internal version name + branch name) used by each package loader:
Oct 22 2021
Great news: of the 469k corrupt SVN revisions, all but 14 (yes, 14) can be fixed simply by adding 1 microsecond to their timestamp.
Oct 20 2021
After further investigation, I can't find any directory that is in a completely bad order; they are either ordered like git does (by adding a / at the end of dir entries) or by assuming a null byte at the end of dir entries.
Oct 15 2021
analysis on directories (some are also part of the fixable_trivial above, but I don't have the exact number, I lost it in my analysis):
according to the list of nodes provided by seirl there were ~21,000,000 revisions without ancestors according to swh-graph snapshot (2020-12-15)
checking in the current live swh DAG 2 days ago 98% have one in release or snapshot_branch.
indeed I was surprised because I did'nt have to loop over the revision history.
21:57 guest@softwareheritage => select count(distinct id) from revision_history where not exists (select 1 from revision where id=parent_id); count ─────── 2218 (1 ligne)
Oct 14 2021
Oct 13 2021
My script finished running on releases. Result: all 644k releases are recoverable (mostly just missing gpg signatures), except 75k whose origin does not exist anymore.