Page MenuHomeSoftware Heritage
Paste P716

irc excerpt from policy wrt. tarballs discussion
ActivePublic

Authored by ardumont on Jul 6 2020, 6:39 PM.
2020-07-03 17:23:20 civodul zack: hey! could you clarify the policy wrt. tarballs?
2020-07-03 17:23:43 civodul i understand it's undesirable, but still it happens, so i'm trying to see if we can have our cake and eat it too :-)
2020-07-03 17:27:30 +zack civodul: I understand it's annoying for your needs, but for now I confirm what both I and rdicosmo have said in the ticket(s), i.e., no tarball archival for now
2020-07-03 17:28:01 +zack the lookup service is more likely to happen, but I (now) understand your point about verifiability of them
2020-07-03 17:28:49 +zack I confess I don't entirely buy it, especially because in the medium term the hashes you have today are going to be broken, and with them your verifiability will become also moot
2020-07-03 17:29:07 +zack but I can see how in the short term it would have been desirable
2020-07-03 17:29:15 +ardumont olasd: yeah, i hear you, i got too eager to be done with that thing, i think
2020-07-03 17:29:38 +ardumont trying to fix now (i have some error about cursor already closed right now ¯\_(ツ)_/¯)
2020-07-03 17:29:57 +olasd ardumont: to be fair, most of my remarks are about code that's around the area you touched, not code you touched specifically
2020-07-03 17:30:07 +zack civodul: I also still think that the lookup service will be better than nothing, but I'd totally understand if you instead consider it's not enough for your needs
2020-07-03 17:30:25 rdicosmo civodul: we see your point, but it's a matter of resources and priority... right now our hands are quite full
2020-07-03 17:30:52 +olasd moranegg[m]: swh:1:rel:ba01e42e250d30c80f3588bdb10fd25bb7769ca8;origin=https://forge.softwareheritage.org/source/swh-model.git;visit=swh:1:snp:d28fb8b2315a2582b7e96efefb4a6e5af381008f is the latest swh.model release
2020-07-03 17:31:01 +olasd (and it's indeed a release)
2020-07-03 17:34:29 +douardda seing this, one advantage I see of having an swh:1:ori SWHID would be to control the length of other SWHID's qualifiers
2020-07-03 17:35:28 +olasd but that makes us an oracle, which is... not great
2020-07-03 17:36:06 +douardda why? it's just a matter of computing a sha(1|256) of the origin's url
2020-07-03 17:36:33 +douardda unless I missed something?
2020-07-03 17:37:57 rdicosmo douardda: the origin carries semantics that we want to make independent of a resolver ...
2020-07-03 17:38:43 rdicosmo ^^^ if we use a hash to encode an origin, then nobody can find out what the origin was without us
2020-07-03 17:39:16 +douardda oh yes I get it
2020-07-03 17:39:39 civodul rdicosmo, zack: alright, thanks for the clear reply
2020-07-03 17:39:49 civodul and yes, i perfectly understand that you have enough on your plate
2020-07-03 17:39:51 rdicosmo ^^^ if one just wants to have a shorted encoding, then it must be a bijection, not a one way function (e.g. base64 etc.) but we discussed this time ago and it seems not worth our while
2020-07-03 17:40:27 civodul in the short term the outcome may be that we'll have Tarball Heritage in parallel ;-)
2020-07-03 17:41:02 civodul NixOS, Guix, etc. will have to maintain their caches and be stricter about preservation
2020-07-03 17:41:55 rdicosmo civodul: keep that up in the short term, and we'll tackle when we'll have more slack :-)
2020-07-03 17:42:28 civodul heh :-)
2020-07-03 17:42:59 civodul i understood it as "no" (regarding storing tarballs) rather than "yes, but later"
2020-07-03 17:43:01 civodul correct?
2020-07-03 17:43:08 civodul (just to make sure there's no misunderstanding)
2020-07-03 17:44:08 rdicosmo civodul: as zack said, it's a "no for now", and the correct negation is "yes, but later" :-)
2020-07-03 17:45:11 civodul ok :-)
2020-07-03 17:46:44 rdicosmo civodul: if some wealthy benefactor pops up inexpectedly with significant resources to donate, later could be sooner, but this kind of event usually only happens in the movies
2020-07-03 17:47:03 civodul sure
2020-07-03 17:47:11 civodul TBH i'm also a bit worried about perception
2020-07-03 17:47:19 civodul dunno
2020-07-03 17:48:29 rdicosmo civodul: Software Heritage is a long term undertaking, it will tke time to roll out everything we need, but we'll get there
2020-07-03 17:49:47 * civodul nods
2020-07-03 17:51:17 civodul it's an issue we discussed 4 years ago though, which is why i bother you more today than back then
2020-07-03 17:51:35 civodul but anyway, thanks for taking the time again
2020-07-03 17:51:45 civodul we'll do our best on our side with the resources that we have
2020-07-03 17:59:23 +ardumont olasd: fixed the shortcomings in D3420 ;)
2020-07-03 17:59:23 -- Notice(swhbot): D3420 (author: ardumont, Needs Review) on swh-storage: pg-storage: Add missing cur parameter passing <https://forge.softwareheritage.org/D3420>
2020-07-03 18:30:44 +zack civodul: fwiw the overhead is not only about storage, but also (mainly? not sure yet) about having another ingestion process to engineer and maintain, as storing tarballs doesn't fit our current ingestion mechanism well
2020-07-03 18:31:39 civodul zack: right, i see
2020-07-03 18:31:54 civodul though an option might be to not do anything special about it
2020-07-03 18:32:22 +zack re: perception, not sure. I think your POV is very specific. As *we will* archive any source code bundle that people want us to archive. But that is not the same as archiving its *container* as is
2020-07-03 18:33:18 +zack civodul: if you just throw the container as is, say, in our blob storage, you lose all the advantages of our data model (fine-grained deduplication, etc.) and you will make the resource problem worse
2020-07-03 18:33:41 civodul i'm pretty sure that concern is not limited to Guix + NixOS
2020-07-03 18:33:58 civodul it's just more acute there because these are the only distros that offer long-term reproducibility
2020-07-03 18:34:26 civodul but release announcements, papers, CMakeLists.txt, READMEs, etc. include hashes of tarballs
2020-07-03 18:35:00 +zack they also include version numbers
2020-07-03 18:35:09 civodul that's not enough
2020-07-03 18:35:13 +zack of course
2020-07-03 18:35:50 +zack but we need to worry first about saving the source code itself that's behind those identifiers, and we're doing it :)
2020-07-03 18:36:07 +zack it's not like that if you don't have the tarball hashes is like not doing the most important part of the job
2020-07-03 18:36:19 civodul sure, and you're doing great :-)
2020-07-03 18:36:22 civodul well
2020-07-03 18:36:50 civodul code that cannot be authenticated just cannot be used in many cases
2020-07-03 18:37:37 +zack i'm still curious about your feedback on my comment on weak hashes
2020-07-03 18:37:51 +zack the more time passes, the more hashes in old announcements will become useless
2020-07-03 18:38:20 civodul yeah, that's true to some extent, though "useless" is a strong word
2020-07-03 18:38:30 civodul it cannot be used as an argument for not having any integrity check at all
2020-07-03 18:39:02 civodul and again, it's not just announcements + those crazy Guix folks ;-)
2020-07-03 18:39:09 +zack we have integrity checks :-)
2020-07-03 18:39:30 +zack we just don't have (yet) the integrity checks that you happen to rely on
2020-07-03 18:39:43 civodul i mean integrity checks for users: my script downloads hello-1.0 and it wants to make sure it really got what it asked for
2020-07-03 18:40:38 +zack anyway, your use case is clear, and we'll get to it --- it's just not available right now, sorry about that
2020-07-03 18:40:59 civodul yup, got it

Event Timeline

ardumont created this paste.Jul 6 2020, 6:39 PM
ardumont created this object with edit policy "No One".
ardumont added subscribers: civodul, zimoun.

@civodul @zimoun heads up ^

(you were no longer on irc to hint you guys)