Page MenuHomeSoftware Heritage

Storage.content_find should return all matches, not just one.
Closed, MigratedEdits Locked

Description

(from https://forge.softwareheritage.org/D645?id=1994#inline-3203 )

The content_find method of the storage API takes as argument a dictionary whose keys are names of hash algorithms and values are digests for these hashes.
It then looks at all content blobs in the archive, and looks for blobs whose hashes match *all* the ones provided.

Most of the time, there are only 0 or 1, and it returns it (if any).
But there may be more than 1 in case of hash collisions, and we need to handle that (this is partly why we support more than one hash algorithm).

Therefore, this function should be modified to return a *list* of contents instead of a single one. It is implemented both in swh/storage/storage.py (postgresql backend of the storage) and swh/storage/in_memory.py (in-memory backend, used for tests).
Its tests and other functions using this one should be updated as well.

Event Timeline

vlorentz triaged this task as Normal priority.Nov 15 2018, 11:39 AM
vlorentz created this task.

can I get this task assigned by the administrator?

In T1349#29267, @Sowmya wrote:

can I get this task assigned by the administrator?

There is no need. First, any registered user can modify any task, including this one. Second, there is no need to assign the task to you: just submit a patch and, when it's ready, we'll review it, closing this task when it's done :-)

Ok sir thank you so much. Also can you please provide information regarding how to contact the mentors regarding the mentioned GSOC projects ?

Contact information are available on our GSoC wiki page (which is in turn linked from the GSoC portal).

faux claimed this task.