Page MenuHomeSoftware Heritage

Add an helper function to list all origins in the storage.
ClosedPublic

Authored by vlorentz on Feb 26 2019, 10:46 AM.

Diff Detail

Repository
rDSTO Storage manager
Branch
iter_origins
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 4466
Build 5914: tox-on-jenkinsJenkins
Build 5913: arc lint + arc unit

Event Timeline

Add test checking that batch_size is not ignored.

@anlambert recently added a list origins method to the Web API. I'm pinging him here to make sure there is no overlap and/or that there is code to be reused/refactored related to this proposed change.

@zack There's indeed an overlap. I'll rewrite the webapp to use this new function.

  • Add origin_from and origin_to arguments to iter_origins.

The logic looks fine, I'm just having a hard time with some sylistic issues that give me a hard time reading the code.

As a point of reference, black on the test file gives:

# Copyright (C) 2019  The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information

from unittest.mock import patch

from swh.storage.in_memory import Storage
from swh.storage.algos.origin import iter_origins


def test_iter_origins():
    storage = Storage()
    origins = storage.origin_add(
        [
            {"type": "foo", "url": "bar"},
            {"type": "baz", "url": "qux"},
            {"type": "quux", "url": "quuz"},
        ]
    )
    assert list(iter_origins(storage)) == origins
    assert list(iter_origins(storage, batch_size=1)) == origins
    assert list(iter_origins(storage, batch_size=2)) == origins

    for i in range(1, 5):
        assert list(iter_origins(storage, origin_from=i + 1)) == origins[i:], i
        assert (
            list(iter_origins(storage, origin_from=i + 1, batch_size=1)) == origins[i:]
        ), i
        assert (
            list(iter_origins(storage, origin_from=i + 1, batch_size=2)) == origins[i:]
        ), i

        for j in range(i, 5):
            assert (
                list(iter_origins(storage, origin_from=i + 1, origin_to=j + 1))
                == origins[i:j]
            ), (i, j)
            assert (
                list(
                    iter_origins(
                        storage, origin_from=i + 1, origin_to=j + 1, batch_size=1
                    )
                )
                == origins[i:j]
            ), (i, j)
            assert (
                list(
                    iter_origins(
                        storage, origin_from=i + 1, origin_to=j + 1, batch_size=2
                    )
                )
                == origins[i:j]
            ), (i, j)


@patch("swh.storage.in_memory.Storage.origin_get_range")
def test_iter_origins_batch_size(mock_origin_get_range):
    storage = Storage()
    mock_origin_get_range.return_value = []

    list(iter_origins(storage))
    mock_origin_get_range.assert_called_with(origin_from=1, origin_count=10000)

    list(iter_origins(storage, batch_size=42))
    mock_origin_get_range.assert_called_with(origin_from=1, origin_count=42)

Which feels slightly more verbose but is easier to read for me.

swh/storage/algos/origin.py
23

Please add spaces around the - operator (I'm surprised flake8 didn't complain)

30

Same here.

swh/storage/tests/algos/test_origin.py
19–24

do we really need these line breaks?

26–44

Phew, the pytest style of assert foo == bar, "message" really, really messes with my brain. The line wrapping is not helping at all.

There's two things here:

  • We should probably just relax the line length warning for tests.
  • The second for loop is slightly more readable than the first, because the two sides of the equality are easier to find.

Do we really need such non-explicit comments in the assert statements?

vlorentz marked 2 inline comments as done.
  • Improve readability.
vlorentz added inline comments.
swh/storage/tests/algos/test_origin.py
26–44

Phew, the pytest style of assert foo == bar, "message" really, really messes with my brain.

I don't like them either.

Do we really need such non-explicit comments in the assert statements?

If the test fail, one needs to print that at some point to understand the failure.

Apart the tests that should be rewritten using unittest.TestCase, looks good to me.

swh/storage/tests/algos/test_origin.py
26–44

I am not a big fan either of the pytest assert style.
Why not using unittest.TestCase as in every other storage tests ?

This revision is now accepted and ready to land.Feb 28 2019, 1:50 PM
vlorentz added inline comments.
swh/storage/tests/algos/test_origin.py
26–44

pytest fixtures are not usable for tests based on unittest.TestCase, so we have to use pytest-style functions for these. For consistency, we're writing all new tests with this style.

swh/storage/tests/algos/test_origin.py
26–44

Not really convinced here as no unusable pytest fixtures are used in swh-storage tests. Anyway, that's just a code style issue so that's not really important.

This revision was automatically updated to reflect the committed changes.