It can be used to prevent buggy or malicious clients of the storage
from adding invalid objects.
On the other hand, it will be computationally expensive, especially for contents.
What do you think?
Differential D3844
Add a validating storage proxy, to check ids before insertion. vlorentz on Aug 25 2020, 6:08 PM. Authored by
Details
It can be used to prevent buggy or malicious clients of the storage On the other hand, it will be computationally expensive, especially for contents.
Diff Detail
Event TimelineComment Actions Build has FAILED Patch application report for D3844 (id=13539)Rebasing onto 4532a4dc64... First, rewinding head to replay your work on top of it... Applying: Add a validating storage proxy, to check ids before insertion. Changes applied before testcommit ffb5a75b456d8ec23df1ba9f519406a133eaca7a Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Aug 25 18:08:27 2020 +0200 Add a validating storage proxy, to check ids before insertion. It can be used to prevent buggy or malicious clients of the storage from adding invalid objects. Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/875/ Comment Actions Build is green Patch application report for D3844 (id=13540)Rebasing onto 4532a4dc64... First, rewinding head to replay your work on top of it... Applying: Add a validating storage proxy, to check ids before insertion. Changes applied before testcommit a9ac2164bfba38b198b3bbcbe3426d8a9a0f04b8 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Aug 25 18:08:27 2020 +0200 Add a validating storage proxy, to check ids before insertion. It can be used to prevent buggy or malicious clients of the storage from adding invalid objects. See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/876/ for more details. Comment Actions
It's computationally expensive but not IO expensive (no queries to the backend), so this cost is rather trivial to assume: buy more hardware and add more swh.storage instances. It would be interesting to give it a try on a test session (docker or staging) and run every loader on a bunch of origins to see if ti catches some bugs, at least. Comment Actions @douardda but even if it's not a bottleneck, we'll have to check it doesn't slow down loaders too much because of the extra latency. Comment Actions I did not look into details but I think that was initially what we wanted to do In that regard, if it can prevent issues, i think it's fine. Maybe that could catch potential serialization issues or something though? [1] recall the other "validate" one (removed now) we transformed into a "convert me those dicts to data model objects" layer ¯\_(ツ)_/¯) Comment Actions looks good to me if we want to test it on staging or docker first, then needs to be in master in the first place heh ;)
Comment Actions Build is green Patch application report for D3844 (id=13606)Rebasing onto 5afd985ebd... First, rewinding head to replay your work on top of it... Fast-forwarded diff-target to base-revision-882-D3844. Changes applied before testSee https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/882/ for more details. |