Now that we have defined an intrinsic PID schema for origins and support for it in both swh identify and swh-graph (as graph roots), we need a way to reverse lookup from origin PIDs to origin URLs.
As I understand it that means:
- adding a column to the origin table for the origin checksum (either as a PID or, more consistently with the rest of the SQL schema, as a SHA1 checksum)
- patch the storage functions that create new origins to also fill the SHA1 column
- add a storage function to perform the SHA1→URL lookup
For the transition we will need to:
- initially mark the SHA1 column as NULL-able
- deploy in production a storage version that fills the SHA1 for new origins
- perform a one off conversion of all old origins that have NULL SHA1s
- mark the SHA1 column as non NULL-able (and add a B-tree index on it)