Page MenuHomeSoftware Heritage

Storage.content_find returns list instead of single value
ClosedPublic

Authored by faux on Mar 24 2019, 6:10 PM.

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
vlorentz requested changes to this revision.Mar 25 2019, 10:32 AM
vlorentz added a subscriber: vlorentz.

In test_content_find_with_present_content, you should not iterate over results, because it would make the test pass if there are no result. Instead, you must check that there is exactly one result, and then check the content of that result.

You must also add another test that checks what happens when a content is duplicated.

swh/storage/in_memory.py
294–295

You can remove that FIXME, because you fixed it :)

321–322

This comment is no longer relevant

swh/storage/storage.py
513–523

Could you rename the variables here? c was a bad choice (before your diff, it's not your fault)

This revision now requires changes to proceed.Mar 25 2019, 10:32 AM
faux added a comment.Mar 27 2019, 4:53 PM

Thanks, will get back to you with required changes.

fwiw, from irc discussion:

20:07 <faux__> pinkieval: I have made all the required changes but I am still confused about the test as in what should we do when the content is duplicated? Sorry to be a bit late as I was travelling so rarely had internet connectivity
20:15 <+pinkieval> faux__: The content is not duplicated in the existing tests. You must add a new test where the content is duplicated, to see how content_find behaves
20:17 <faux__> By using content_add right? I did that but apparently content_find only finds one result and not two of the same result in the list
20:31 <+pinkieval> the goal of your change is to make content_find find more than one
10:17 <+ardumont> because content_add filters on existing contents so if you inject the same content twice, you will have only 1 content in the db
faux added a comment.Apr 2 2019, 5:30 AM

fwiw, from irc discussion:

20:07 <faux__> pinkieval: I have made all the required changes but I am still confused about the test as in what should we do when the content is duplicated? Sorry to be a bit late as I was travelling so rarely had internet connectivity
20:15 <+pinkieval> faux__: The content is not duplicated in the existing tests. You must add a new test where the content is duplicated, to see how content_find behaves
20:17 <faux__> By using content_add right? I did that but apparently content_find only finds one result and not two of the same result in the list
20:31 <+pinkieval> the goal of your change is to make content_find find more than one
10:17 <+ardumont> because content_add filters on existing contents so if you inject the same content twice, you will have only 1 content in the db

I also found out the same thing when I was adding duplicate content to database using content_add........ so if it will automatically filter duplicate data then content_find should return only one data as a list, right?.....

ardumont added inline comments.Apr 10 2019, 11:01 PM
swh/storage/db.py
231–232

If there is no longer a need for limit, then remove it.

232

Please, remove the print statement ;)

faux updated this revision to Diff 4513.Apr 10 2019, 11:35 PM
faux edited the summary of this revision. (Show Details)

Have made the requested changes
In db.py : I have changed the content_find method to make the sql query on python side
In test_storage.py : Have added test for duplicate content

ardumont added inline comments.Apr 11 2019, 9:47 AM
swh/storage/db.py
245

Remove the limit if there is no need for it.

(LIMIT ALL seems like no limit to me ;)

ardumont added inline comments.Apr 11 2019, 10:01 AM
swh/storage/db.py
241

Your query build seems a bit complicated.
Can you please try and adapt more along the lines D1345#inline-8002?

It's simpler to read.
Thanks.

ardumont requested changes to this revision.Apr 11 2019, 10:03 AM

As per previous comment.

This revision now requires changes to proceed.Apr 11 2019, 10:03 AM
vlorentz requested changes to this revision.Apr 11 2019, 10:08 AM
vlorentz added inline comments.
swh/storage/tests/test_storage.py
2425–2426

You shouldn't need an if in a test.

If you expect content_find to return an object, then assume it returns an object.

(same comment on all the conditionals below)

2526–2531

You know the length of the expected return value of self.storage.content_find (2), no need for a for loop that runs only twice.

(Because, if there is a bug in content_find that makes it return only a single element, the for loop will not run, and the test won't catch the bug)

2526–2531

only once *

faux added a comment.EditedApr 11 2019, 3:38 PM

Almost there. Just stuck on the query part. I do think it is similar to https://forge.softwareheritage.org/D1345#inline-8002.

faux updated this revision to Diff 4539.Apr 12 2019, 5:47 AM

Removed the if(s), for loop, LIMIT ALL. Made the query a bit more readable.

vlorentz requested changes to this revision.Apr 12 2019, 12:02 PM
vlorentz added inline comments.
swh/storage/tests/test_storage.py
2425–2426

You should also test the length of the list returned by content_find.

(same comment on the calls below)

2526–2531

You should fully test the return value of content_find:

result = list(self.storage.content_find(finder))


expected_result = [
    {
        ...
    },
    {
        ...
    },
]
self.assertEqual(expected_result, actual_result)
This revision now requires changes to proceed.Apr 12 2019, 12:02 PM
faux updated this revision to Diff 4556.Apr 12 2019, 4:37 PM
faux marked 9 inline comments as done.

Added more tests for content_find.

Looking good!

Could you just add one more test, this time with only sha256 (or blake2s256) that collides, and tests:

  1. content_find with only sha256 in the finder
  2. content_find with both sha256 and blake2s256 in the finder
faux updated this revision to Diff 4559.Apr 13 2019, 9:25 AM

Added the tests for colliding sha256 and blake2s256 hashes

vlorentz requested changes to this revision.Apr 13 2019, 10:50 AM
vlorentz added inline comments.
swh/storage/db.py
240

You can merge this for loop with the other one.

249–253

There is nothing wrong with returning an empty list. So always return content even if it's empty.
It only made sense to return None when we returned a single item.

swh/storage/tests/test_storage.py
2566

You don't need to compare the length, assertCountEqual does it already.

2624

same

This revision now requires changes to proceed.Apr 13 2019, 10:50 AM
ardumont resigned from this revision.Apr 13 2019, 11:06 AM

@vlorentz is doing a great job already ;)

faux updated this revision to Diff 4560.EditedApr 13 2019, 11:14 AM
faux marked 2 inline comments as done.

Merged the for loop removed assertEqual for length and removed if from content find

faux added a comment.Apr 13 2019, 11:18 AM

Sorry pushed the wrong thing...

faux updated this revision to Diff 4564.Apr 13 2019, 11:22 AM

Made the required changes

faux updated this revision to Diff 4594.Apr 16 2019, 2:19 PM

Missed a function in in_memory

faux updated this revision to Diff 4843.May 15 2019, 10:38 PM

First build failed uploading again

faux marked 4 inline comments as done.May 15 2019, 10:41 PM
faux updated this revision to Diff 4844.May 15 2019, 11:04 PM

Used the for loop once removed new_checksum_dict as it was not needed.

faux marked an inline comment as done.May 15 2019, 11:04 PM
vlorentz added inline comments.May 16 2019, 10:45 AM
swh/storage/db.py
223–231

nitpick: you can rewrite it like this to be more readable:

where_parts = []
args = []
for algorithm in checksum_dict:
    if checksum_dict[algorithm] is not None:
        parts.append(checksum_dict[algorithm])
        where_parts.append(algorithm + ' = %s')

then use ' AND '.join(where_parts) below.

faux updated this revision to Diff 4845.May 16 2019, 11:10 AM

Hopefully that does it.

faux marked an inline comment as done.May 16 2019, 11:11 AM
vlorentz accepted this revision.May 16 2019, 1:43 PM
This revision is now accepted and ready to land.May 16 2019, 1:43 PM
This revision was automatically updated to reflect the committed changes.
faux added a comment.May 16 2019, 1:54 PM

Dayummmm! I did it. Thanks @vlorentz