Well now:
- the types are here.
- All *_add endpoints from storages are taking as input the new model objects
- All storage tests are using the data model objects as input
Well now:
Reading the code dealing with snapshot branches in several storage implementations, it really seems to me that storing them as a dict-like structure has no advantage.
But I'd like to use the opportunity of this cleanup to go a bit further than "the minimal amount of work for pedantic correctness", and actually make changes that have a conceptual meaning.
Nothing against it either.
If that can make us ingest faster, it'd be neat.
The main part is done, actually make the origin-visit immutable.
It's been deployed fully now.
not sure about the db space as an argument, but the CPU is by itself worth the move IMHO.
In T2430#46040, @zack wrote:@civodul I wanted to raise the topic of storing container metadata (in the style of what tools like pristine-tar do) here too, so thanks for giving me the chance :-)
@civodul I wanted to raise the topic of storing container metadata (in the style of what tools like pristine-tar do) here too, so thanks for giving me the chance :-)
I agree it might be a technical solution, *but*, I'm not sure I see the point.
Didn't you agree that having a "lookup service" from tarball/container checksums to SWHIDs (the Software Heritage identifiers, that can then be used to lookup stuff in the archive) would be enough to satisfy distro needs?
If yes, then "archiving container metadata" could be replaced by simply having a way to add entries to the lookup table. And allowing distros to do so is option that we can explore. (Once the service exists, of course.)
Do I get it right that the primary reason why tarballs aren't systematically archived is that doing so would be too expensive storage-wise (no deduplication)?
closed by D3152
Thanks for your feedback, @rdicosmo!
Previously proposed "short-term" solution does not work. So the only "short-term" solution is to make DiskBakedContent inherit from BaseModel (or BaseContent).
This task is currently blocked by an implementation "detail":
In T2430#45767, @zimoun wrote:
- if you still have that tarball at hand, then it can be ingested in SWH, and we keep the correspondence between SWHID and SHA256; in principle, you need to trust us, but one can foresee having external parties checking that the correspondence is real while the tarball is still there, and adding their observation to the chain of trust means you need to trust us less and less
By we keep the correspondence between SWHID and SHA256 you mean you on the SWH side?
- if you still have that tarball at hand, then it can be ingested in SWH, and we keep the correspondence between SWHID and SHA256; in principle, you need to trust us, but one can foresee having external parties checking that the correspondence is real while the tarball is still there, and adding their observation to the chain of trust means you need to trust us less and less
In T2430#45764, @civodul wrote:@rdicosmo The discussion of the "source of trust" is an important one, and it's interesting to see how we can address it going forward.
The proposal of a correspondence table, as I wrote on swh-devel, leaves open the question of today's and yesterday's software, assuming SWHIDs become the de facto standard tomorrow. How can I check the integrity of code fetched from SWH if all I have is its tarball's SHA256 from its release announcement? How can I check its authenticity if all I have is an OpenPGP signature computed over a tarball?
@rdicosmo The discussion of the "source of trust" is an important one, and it's interesting to see how we can address it going forward.