Page MenuHomeSoftware Heritage

drop blake2 hashes
Open, NormalPublic


Is there a point in keeping blake2 hashes in the content table?

With Git moving to SHA2[56] I don't see hash moving to blake2 anytime soon, and anyway it's always only been in the content table.
Dropping blake2 will free up some DB space (once Postgres is able to reclaim it...) and reduce both CPU and IO time during ingestion.

Aside from the work needed to drop them, do we have any reason to keep blake2 hashes around?

Event Timeline

zack triaged this task as Normal priority.Jul 3 2020, 4:15 PM
zack created this task.

not sure about the db space as an argument, but the CPU is by itself worth the move IMHO.

Nothing against it either.
If that can make us ingest faster, it'd be neat.

Technical impacts is somewhat a tad bigger that we can think though, so far:

  • Content model update
  • storages* update to drop the column
  • (I think we'll need to stop workers so the column drop can happen fast)
  • replayer fixer (most probably)
  • and then content topics to backfill