Page MenuHomeSoftware Heritage

Fix FilteringProxy to not drop skipped-contents with a missing sha1_git.
ClosedPublic

Authored by vlorentz on Feb 12 2020, 4:18 PM.

Details

Summary

Passes them all to the backend instead of silently dropping them all
if any of them is not missing.

Depends on D2662.

Diff Detail

Repository
rDSTO Storage manager
Branch
FilteringProxy-none-sha1_git
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 10577
Build 15784: tox-on-jenkinsJenkins
Build 15783: arc lint + arc unit

Event Timeline

olasd added a subscriber: olasd.

I guess this is the minimal step to unbreak the logic, but I'm not fond of what this is doing at all: the "key" for a skipped content really is the quadruple of hashes, some potentially null; not any single one of them.

At some point this logic should be adapted to use a proper key tuple instead of a single hash.

This revision is now accepted and ready to land.Feb 14 2020, 4:44 PM
In D2663#63709, @olasd wrote:

I guess this is the minimal step to unbreak the logic

Yes indeed. The point of this proxy is not to be optimal in its detection, but to save some bandwidth in practice, so I think it's good enough this way

In D2663#63709, @olasd wrote:

I guess this is the minimal step to unbreak the logic

Yes indeed. The point of this proxy is not to be optimal in its detection, but to save some bandwidth in practice, so I think it's good enough this way

Well. If we consider the case of two different contents with an identical sha1_git, different other_hashes, and a size larger than the limit, in practice the current implementation of the filter would make one mask the other.

This is the edge of an edge case that shouldn't show up in practice, of course. But we should note it down and fix that eventually.