This changes the output of content_find method to a list in case of hash collisions
Details
- Reviewers
vlorentz ardumont - Group Reviewers
Reviewers - Maniphest Tasks
- T1349: Storage.content_find should return all matches, not just one.
- Commits
- rDSTOC02134a705a12: Changes the output of content_find method to a list in case of hash collisions…
rDSTO02134a705a12: Changes the output of content_find method to a list in case of hash collisions… - Required Signatures
L3 Software Heritage Contributor License Agreement, version 1.0
Diff Detail
- Repository
- rDSTO Storage manager
- Branch
- t1349
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 5320 Build 7214: tox-on-jenkins Jenkins Build 7213: arc lint + arc unit
Event Timeline
In test_content_find_with_present_content, you should not iterate over results, because it would make the test pass if there are no result. Instead, you must check that there is exactly one result, and then check the content of that result.
You must also add another test that checks what happens when a content is duplicated.
swh/storage/in_memory.py | ||
---|---|---|
216–217 | You can remove that FIXME, because you fixed it :) | |
243–244 | This comment is no longer relevant | |
swh/storage/storage.py | ||
406–416 | Could you rename the variables here? c was a bad choice (before your diff, it's not your fault) |
fwiw, from irc discussion:
20:07 <faux__> pinkieval: I have made all the required changes but I am still confused about the test as in what should we do when the content is duplicated? Sorry to be a bit late as I was travelling so rarely had internet connectivity 20:15 <+pinkieval> faux__: The content is not duplicated in the existing tests. You must add a new test where the content is duplicated, to see how content_find behaves 20:17 <faux__> By using content_add right? I did that but apparently content_find only finds one result and not two of the same result in the list 20:31 <+pinkieval> the goal of your change is to make content_find find more than one 10:17 <+ardumont> because content_add filters on existing contents so if you inject the same content twice, you will have only 1 content in the db
I also found out the same thing when I was adding duplicate content to database using content_add........ so if it will automatically filter duplicate data then content_find should return only one data as a list, right?.....
Have made the requested changes
In db.py : I have changed the content_find method to make the sql query on python side
In test_storage.py : Have added test for duplicate content
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tox/357/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tox/357/console
swh/storage/db.py | ||
---|---|---|
226 | Remove the limit if there is no need for it. (LIMIT ALL seems like no limit to me ;) |
swh/storage/db.py | ||
---|---|---|
222 | Your query build seems a bit complicated. It's simpler to read. |
swh/storage/tests/test_storage.py | ||
---|---|---|
1611–1615 | You shouldn't need an if in a test. If you expect content_find to return an object, then assume it returns an object. (same comment on all the conditionals below) | |
1716–1721 | You know the length of the expected return value of self.storage.content_find (2), no need for a for loop that runs only twice. (Because, if there is a bug in content_find that makes it return only a single element, the for loop will not run, and the test won't catch the bug) | |
1716–1721 | only once * |
Almost there. Just stuck on the query part. I do think it is similar to https://forge.softwareheritage.org/D1345#inline-8002.
Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/361/ for more details.
swh/storage/tests/test_storage.py | ||
---|---|---|
1611–1617 | You should also test the length of the list returned by content_find. (same comment on the calls below) | |
1716–1721 | You should fully test the return value of content_find: result = list(self.storage.content_find(finder)) expected_result = [ { ... }, { ... }, ] self.assertEqual(expected_result, actual_result) |
Looking good!
Could you just add one more test, this time with only sha256 (or blake2s256) that collides, and tests:
- content_find with only sha256 in the finder
- content_find with both sha256 and blake2s256 in the finder
Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/364/ for more details.
Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/366/ for more details.
swh/storage/db.py | ||
---|---|---|
220 | You can merge this for loop with the other one. | |
229–233 | There is nothing wrong with returning an empty list. So always return content even if it's empty. | |
swh/storage/tests/test_storage.py | ||
1755 | You don't need to compare the length, assertCountEqual does it already. | |
1813 | same |
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tox/367/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tox/367/console
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tox/368/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tox/368/console
Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/369/ for more details.
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tox/374/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tox/374/console
Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/384/ for more details.
Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/385/ for more details.
swh/storage/db.py | ||
---|---|---|
209–217 | nitpick: you can rewrite it like this to be more readable: where_parts = [] args = [] for algorithm in checksum_dict: if checksum_dict[algorithm] is not None: parts.append(checksum_dict[algorithm]) where_parts.append(algorithm + ' = %s') then use ' AND '.join(where_parts) below. |
Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/386/ for more details.