Page MenuHomeSoftware Heritage

utils.grouper: Improve implementation
ClosedPublic

Authored by ardumont on Dec 7 2018, 7:11 PM.

Details

Summary

Loader-tar defines a slightly different utils.grouper to deal with iterable of tuples.
This improves the generic definition to deal with that case.
That allows to remove some code in the loader-tar.

Related T1411

Test Plan

tox

Diff Detail

Repository
rDCORE Foundations and core functionalities
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ardumont created this revision.Dec 7 2018, 7:11 PM
vlorentz requested changes to this revision.Dec 8 2018, 9:30 AM
vlorentz added inline comments.
swh/core/tests/test_utils.py
46

That line should be [(8, 9), (None, None)]

56

That line should be:

self.assertEqual(out, [[9, 8, 7, 6], [5, 4, 3, 2], [1, 'a', 'a', 'a']])
swh/core/utils.py
44–46

Replace the for loop with what the grouper from the tarloader does:

return itertools.zip_longest(*args, fillvalue=fillvalue)

You don't want to drop the fill values (particularly if one of the items in the iterator happens to be equal to the fill value).

This revision now requires changes to proceed.Dec 8 2018, 9:30 AM
ardumont marked 3 inline comments as done.Dec 8 2018, 10:54 AM
ardumont added inline comments.
swh/core/tests/test_utils.py
46

According to the wrong docstring (because incomplete), you are correct!

56

According to the wrong docstring (because incomplete) you are correct!

swh/core/utils.py
44–46

I hear you but yes we do.
I realize the docstring is wrong again and the prototype must be changed as well.
The actual callers (quite some with the None as fillvalue) expects that.

Also, fillvalue is both a fillvalue and a stop value that we don't want involved in the last block.
Well, that's how i expected the function to be used initially (and is used as far as i can tell).

What i need to do:

  • the first line of the docstring should be improved to mention that (Collect data into fixed-length blocks except for the last block.That last block will contain only the remaining number of elements.).
  • fillvalue should be renamed to stopvalue in the grouper prototype (even though it will be used as fillvalue in the implementation).
vlorentz added inline comments.Dec 8 2018, 11:00 AM
swh/core/utils.py
44–46

Then you can drop the fillvalue/stopvalue argument and always set it with stopvalue=object(). This is guaranteed to be unique.

ardumont marked an inline comment as done.Dec 8 2018, 11:10 AM
ardumont added inline comments.
swh/core/utils.py
44–46

Then you can drop the fillvalue/stopvalue argument and always set it with stopvalue=object(). This is guaranteed to be unique.

Nice, that would address (particularly if one of the items in the iterator happens to be equal to the fill value). better.
Better than assuming it won't happen as nothing really enforces this).

Thanks!

ardumont updated this revision to Diff 2519.Dec 8 2018, 11:12 AM
  • utils.grouper: Rename fillvalue to stop_value and fix docstring
  • utils.grouper: Avoid potential bug of input data matching stop_value
ardumont retitled this revision from utils.grouper: Open fillvalue argument to function to utils.grouper: Improve implementation.Dec 8 2018, 11:13 AM
ardumont edited the summary of this revision. (Show Details)
vlorentz accepted this revision.Dec 8 2018, 11:22 AM
This revision is now accepted and ready to land.Dec 8 2018, 11:22 AM
This revision was automatically updated to reflect the committed changes.