Details

@anlambert recently added a list origins method to the Web API. I'm pinging him here to make sure there is no overlap and/or that there is code to be reused/refactored related to this proposed change.

@zack There's indeed an overlap. I'll rewrite the webapp to use this new function.

Add origin_from and origin_to arguments to iter_origins.

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/208/ for more details.

Harbormaster completed remote builds in B4450: Diff 3758.Feb 26 2019, 11:43 AM

The logic looks fine, I'm just having a hard time with some sylistic issues that give me a hard time reading the code.

As a point of reference, black on the test file gives:

# Copyright (C) 2019  The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information

from unittest.mock import patch

from swh.storage.in_memory import Storage
from swh.storage.algos.origin import iter_origins


def test_iter_origins():
    storage = Storage()
    origins = storage.origin_add(
        [
            {"type": "foo", "url": "bar"},
            {"type": "baz", "url": "qux"},
            {"type": "quux", "url": "quuz"},
        ]
    )
    assert list(iter_origins(storage)) == origins
    assert list(iter_origins(storage, batch_size=1)) == origins
    assert list(iter_origins(storage, batch_size=2)) == origins

    for i in range(1, 5):
        assert list(iter_origins(storage, origin_from=i + 1)) == origins[i:], i
        assert (
            list(iter_origins(storage, origin_from=i + 1, batch_size=1)) == origins[i:]
        ), i
        assert (
            list(iter_origins(storage, origin_from=i + 1, batch_size=2)) == origins[i:]
        ), i

        for j in range(i, 5):
            assert (
                list(iter_origins(storage, origin_from=i + 1, origin_to=j + 1))
                == origins[i:j]
            ), (i, j)
            assert (
                list(
                    iter_origins(
                        storage, origin_from=i + 1, origin_to=j + 1, batch_size=1
                    )
                )
                == origins[i:j]
            ), (i, j)
            assert (
                list(
                    iter_origins(
                        storage, origin_from=i + 1, origin_to=j + 1, batch_size=2
                    )
                )
                == origins[i:j]
            ), (i, j)


@patch("swh.storage.in_memory.Storage.origin_get_range")
def test_iter_origins_batch_size(mock_origin_get_range):
    storage = Storage()
    mock_origin_get_range.return_value = []

    list(iter_origins(storage))
    mock_origin_get_range.assert_called_with(origin_from=1, origin_count=10000)

    list(iter_origins(storage, batch_size=42))
    mock_origin_get_range.assert_called_with(origin_from=1, origin_count=42)

Which feels slightly more verbose but is easier to read for me.

swh/storage/algos/origin.py
23	Please add spaces around the `-` operator (I'm surprised flake8 didn't complain)
30	Same here.
swh/storage/tests/algos/test_origin.py
19–24	do we really need these line breaks?
26–44	Phew, the pytest style of `assert foo == bar, "message"` really, really messes with my brain. The line wrapping is not helping at all. There's two things here: We should probably just relax the line length warning for tests. The second for loop is slightly more readable than the first, because the two sides of the equality are easier to find. Do we really need such non-explicit comments in the assert statements?

Improve readability.

vlorentz marked 3 inline comments as done.Feb 27 2019, 11:56 AM

vlorentz added inline comments.

swh/storage/tests/algos/test_origin.py
26–44	Phew, the pytest style of assert foo == bar, "message" really, really messes with my brain. I don't like them either. Do we really need such non-explicit comments in the assert statements? If the test fail, one needs to print that at some point to understand the failure.

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/210/ for more details.

Harbormaster completed remote builds in B4466: Diff 3772.Feb 27 2019, 12:00 PM

Apart the tests that should be rewritten using unittest.TestCase, looks good to me.

swh/storage/tests/algos/test_origin.py
26–44	I am not a big fan either of the pytest assert style. Why not using `unittest.TestCase` as in every other storage tests ?

This revision is now accepted and ready to land.Feb 28 2019, 1:50 PM

vlorentz marked an inline comment as done.Feb 28 2019, 2:40 PM

vlorentz added inline comments.

swh/storage/tests/algos/test_origin.py
26–44	pytest fixtures are not usable for tests based on `unittest.TestCase`, so we have to use pytest-style functions for these. For consistency, we're writing all new tests with this style.

anlambert added inline comments.Feb 28 2019, 2:50 PM

swh/storage/tests/algos/test_origin.py
26–44	Not really convinced here as no unusable pytest fixtures are used in swh-storage tests. Anyway, that's just a code style issue so that's not really important.