Details

This lacks a unit test: just add a test for the new method. It's nice to also have the test for the revision_metadata_add() call, but there should be a test that ensure the _check_duplicates() behaves ok (once again, with valid and invalid inputs), and ensure the newly added validator works in every modified method. I count 7 of them patched (plus the hunk in in_memory.py). I would expect (functionally) 8 tests.

This revision now requires changes to proceed.Feb 7 2019, 9:44 AM

vlorentz added a child revision: D1095: Factorize tests of the indexer storages..Feb 7 2019, 2:06 PM

Move _check_duplicates outside a class and add doctests.

Build is green
See https://jenkins.softwareheritage.org/job/DCIDX/job/tox/357/ for more details.

Harbormaster completed remote builds in B4060: Diff 3433.Feb 7 2019, 2:20 PM

This lacks a unit test: just add a test for the new method.

Done.

and ensure the newly added validator works in every modified method. I count 7 of them patched

Done in D1095 for 5 of them. The 2 other ones will come in a later Diff (because these endpoints behave differently)

(plus the hunk in in_memory.py).

It's already tested by the tests in test_storage.py (which are run three times each: pg storage, memory storage, and client api).

Done in D1095 for 5 of them. The 2 other ones will come in a later Diff (because these endpoints behave differently)

I'm not a fan of adding missing pieces of a diff in another one.
Those tests are missing here, not in 1095.

I understand the need for refactoring but we should strive to keep the perimeter of a diff sandboxed.
Let 1095 be a tests refactoring and not 'a tests refactoring + a bug fix + adding missing tests'.

In D1079#23168, @ardumont wrote:

I understand the need for refactoring but we should strive to keep the perimeter of a diff sandboxed.
Let 1095 be a tests refactoring and not 'a tests refactoring + a bug fix + adding missing tests'.

I don't want to write non-trivial code that will get deleted in the next diff anyway.

I understand the need for refactoring but we should strive to keep the perimeter of a diff sandboxed.
Let 1095 be a tests refactoring and not 'a tests refactoring + a bug fix + adding missing tests'.

I don't want to write non-trivial code that will get deleted in the next diff anyway.

There could be a middle ground somewhere though.

You could have started by the refactoring first.
Then a diff to add this diff (and the 5 missing tests easier to add a priori with 1095)
Then the bugfix 1095 includes as another diff.

vlorentz mentioned this in D1095: Factorize tests of the indexer storages..Feb 7 2019, 2:59 PM

vlorentz removed a child revision: D1095: Factorize tests of the indexer storages..Feb 7 2019, 3:00 PM

vlorentz added a parent revision: D1095: Factorize tests of the indexer storages..

Rewrite as a child of D1095

Harbormaster failed remote builds in B4062: Diff 3435!Feb 7 2019, 3:02 PM

Build has FAILED

Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tox/359/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tox/359/console

You could have started by the refactoring first.
Then a diff to add this diff (and the 5 missing tests easier to add a priori with 1095)

Done; I swapped the order of D1079 and D1095

Then the bugfix 1095 includes as another diff.

Done as D1096.

douardda requested changes to this revision.Feb 7 2019, 5:59 PM

douardda added inline comments.

swh/indexer/storage/__init__.py
51–79	I'm still quite not happy with this for several reasons: the arg name 'column' make no sense in the context of checking stuff on dictionnaries, the psycopg stuff in the docstring is very specific to the intended (primary) usage of this function, but there is no reason to make it specific to that usage, more importantly, I still do not see a unit test for this function. For example it's very easy to make it fail in an unexpected way using a dict as value in a dict (since these are not hashable, the function will generates a TypeError). If we do not bother to be bullet proof, make it explicit in the doctring.

This revision now requires changes to proceed.Feb 7 2019, 5:59 PM

Remove doc about psycopg error.
s/column/key/g
Document that values must be hashable.

vlorentz marked an inline comment as done.Feb 8 2019, 1:38 PM

vlorentz added inline comments.

swh/indexer/storage/__init__.py
51–79	the arg name 'column' make no sense in the context of checking stuff on dictionnaries, Indeed, fixed. the psycopg stuff in the docstring is very specific to the intended (primary) usage of this function, but there is no reason to make it specific to that usage, It's in the module that implements the pg backend and its name is prefixed with `_`, so it's unlikely to be used anywhere else. Fixed anyway. I still do not see a unit test for this function There's the doctest If we do not bother to be bullet proof, make it explicit in the doctring. Indeed, fixed.