Page MenuHomeSoftware Heritage

storage.in_memory: Fix origin_list implementation
AbandonedPublic

Authored by ardumont on Aug 1 2020, 11:16 AM.

Details

Reviewers
None
Group Reviewers
Reviewers
Maniphest Tasks
T645: Type swh-storage endpoints with swh.model objects
Summary

This kept on creating unrelated errors on other modules (see D3682 for
the most recent but not only)

Related to T645

Test Plan

tox

Diff Detail

Repository
rDSTO Storage manager
Branch
drop-origin-get-range
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 14208
Build 21842: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 21841: arc lint + arc unit

Unit TestsFailed

TimeTest
1,664 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_cassandra.TestCassandraStorageGeneratedData::test_origin_list[1000]
self = <swh.storage.tests.test_cassandra.TestCassandraStorageGeneratedData object at 0x7f931f6f9fd0> swh_storage = <swh.storage.cassandra.storage.CassandraStorage object at 0x7f93cc21c668> swh_origins = [Origin(url='http://some.long.host.name/path/with/anchor#id=42'), Origin(url='deb://xn--n28h.tld/stuff'), Origin(url='...l='https://xn--n28h.tld/path/with/anchor#id=42'), Origin(url='https://some.long.host.name/path/with/qargs?q=1&b'), ...]
2,104 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_cassandra.TestCassandraStorageGeneratedData::test_origin_list[100]
self = <swh.storage.tests.test_cassandra.TestCassandraStorageGeneratedData object at 0x7f931ee3e630> swh_storage = <swh.storage.cassandra.storage.CassandraStorage object at 0x7f9320771ef0> swh_origins = [Origin(url='https://example.com/path/with/anchor#id=42'), Origin(url='svn://example.com'), Origin(url='http://example...tuff/'), Origin(url='git://some.long.host.name'), Origin(url='https://some.long.host.name/path/with/qargs?q=1&b'), ...]
1,718 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_cassandra.TestCassandraStorageGeneratedData::test_origin_list[10]
self = <swh.storage.tests.test_cassandra.TestCassandraStorageGeneratedData object at 0x7f931ee3e0f0> swh_storage = <swh.storage.cassandra.storage.CassandraStorage object at 0x7f931ee3e7f0> swh_origins = [Origin(url='svn://xn--n28h.tld/stuff'), Origin(url='git://example.com/path/to/resource'), Origin(url='mock://xn--n28h...), Origin(url='http://some.long.host.name/path/with/qargs?q=1&b'), Origin(url='http://some.long.host.name/stuff'), ...]
2,121 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_cassandra.TestCassandraStorageGeneratedData::test_origin_list[1]
self = <swh.storage.tests.test_cassandra.TestCassandraStorageGeneratedData object at 0x7f9321084b38> swh_storage = <swh.storage.cassandra.storage.CassandraStorage object at 0x7f9321085b70> swh_origins = [Origin(url='https://example.com/stuff'), Origin(url='http://xn--n28h.tld/path/to/resource'), Origin(url='deb://xn--n2...'), Origin(url='deb://xn--n28h.tld'), Origin(url='http://some.long.host.name'), Origin(url='http://xn--n28h.tld'), ...]
1,993 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_cassandra.TestCassandraStorageGeneratedData::test_origin_list[7]
self = <swh.storage.tests.test_cassandra.TestCassandraStorageGeneratedData object at 0x7f9321085390> swh_storage = <swh.storage.cassandra.storage.CassandraStorage object at 0x7f931eea91d0> swh_origins = [Origin(url='http://xn--n28h.tld/path/with/qargs?q=1&b'), Origin(url='svn://example.com/path/with/anchor#id=42'), Orig... Origin(url='https://example.com/path/with/anchor#id=42'), Origin(url='git://example.com/path/with/anchor#id=42'), ...]
View Full Test Results (5 Failed · 727 Passed · 15 Skipped)

Event Timeline

Build is green

Patch application report for D3683 (id=12966)

Could not rebase; Attempt merge onto 92f1183de0...

Updating 92f1183d..e2e6a1b8
Fast-forward
 swh/storage/algos/origin.py            | 32 ++++---------
 swh/storage/cassandra/storage.py       | 12 ++++-
 swh/storage/in_memory.py               | 44 ++++++++++--------
 swh/storage/interface.py               | 30 +++++-------
 swh/storage/storage.py                 | 14 +++++-
 swh/storage/tests/algos/test_origin.py | 84 +++++-----------------------------
 swh/storage/tests/test_cassandra.py    |  8 ----
 swh/storage/tests/test_storage.py      | 62 +++++++++----------------
 8 files changed, 102 insertions(+), 184 deletions(-)
Changes applied before test
commit e2e6a1b825b25f58de35ca5c118801f4a87c8e2a
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Aug 1 11:15:56 2020 +0200

    storage.in_memory: Fix origin_list implementation
    
    This kept on creating unrelated errors on other modules.
    
    Related to T645

commit 77757a0dccb429bd84154df11ad58c8442d75b79
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Jul 31 17:56:13 2020 +0200

    storage*: Drop origin-get-range in favor of origin-list
    
    Related to T645

commit 87c5ba223508b212ed6939496a6380a771b05e43
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Aug 1 09:07:12 2020 +0200

    storage*: Do not allow unknown visit status in origin_visit*_get_latest
    
    That makes some storage (pg-storage) fail without that filtering.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/648/ for more details.

Add scenario with page-token

tests need storage > v0.11.8 to pass

Build has FAILED

Patch application report for D3683 (id=12967)

Rebasing onto 849d063758...

Current branch diff-target is up to date.
Changes applied before test
commit 95882a4212324f828a13c19ddc0ed8284e93362b
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Aug 1 10:03:23 2020 +0200

    cli.task: Migrate scheduler cli to latest storage change on iter_origins
    
    Related to T645

Link to build: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/52/
See console output for more information: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/52/console

Build is green

Patch application report for D3683 (id=12969)

Could not rebase; Attempt merge onto 92f1183de0...

Updating 92f1183d..6524b2ab
Fast-forward
 swh/storage/algos/origin.py            | 32 ++++---------
 swh/storage/cassandra/storage.py       | 12 ++++-
 swh/storage/in_memory.py               | 45 ++++++++++--------
 swh/storage/interface.py               | 30 +++++-------
 swh/storage/storage.py                 | 14 +++++-
 swh/storage/tests/algos/test_origin.py | 84 +++++-----------------------------
 swh/storage/tests/test_cassandra.py    |  8 ----
 swh/storage/tests/test_storage.py      | 62 +++++++++----------------
 8 files changed, 102 insertions(+), 185 deletions(-)
Changes applied before test
commit 6524b2ab578733d49601b80bc7d9ce30cef27b39
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Aug 1 11:15:56 2020 +0200

    storage.in_memory: Fix origin_list implementation
    
    This kept on creating unrelated errors on other modules.
    
    Related to T645

commit 77757a0dccb429bd84154df11ad58c8442d75b79
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Jul 31 17:56:13 2020 +0200

    storage*: Drop origin-get-range in favor of origin-list
    
    Related to T645

commit 87c5ba223508b212ed6939496a6380a771b05e43
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Aug 1 09:07:12 2020 +0200

    storage*: Do not allow unknown visit status in origin_visit*_get_latest
    
    That makes some storage (pg-storage) fail without that filtering.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/649/ for more details.

I don't understand what the issue was. Could you add a test that triggers it?

I don't understand what the issue was. Could you add a test that triggers it?

ack, it's an order issue which is hidden in the current test as it's sorting out both the input and the output prior to the comparison.

Replace [1] by a bare:

assert returned_origins == swh_origins

And the implementation without this diff will fail [2]

[1] https://forge.softwareheritage.org/source/swh-storage/browse/master/swh/storage/tests/test_storage.py$4012

[2]

$ pytest -vv -x -s swh/storage/tests/test_in_memory.py -k test_origin_list
============================================================================================================= test session starts =============================================================================================================
platform linux -- Python 3.7.3, pytest-6.0.1, py-1.9.0, pluggy-0.13.1 -- /home/tony/.virtualenvs/swh/bin/python3
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/home/tony/work/inria/repo/swh/swh-environment/swh-storage/.hypothesis/examples')
rootdir: /home/tony/work/inria/repo/swh/swh-environment/swh-storage, configfile: pytest.ini
plugins: xdist-1.32.0, aiohttp-0.3.0, celery-4.4.6, testinfra-5.2.1, asyncio-0.14.0, postgresql-2.4.0, forked-1.2.0, requests-mock-1.8.0, mock-3.1.1, dash-1.13.1, django-3.9.0, flask-1.0.0, hypothesis-5.19.0, kafka-0.4.0, swh.journal-0.4.0, swh.core-0.2.2
collected 164 items / 159 deselected / 5 selected

swh/storage/tests/test_in_memory.py::TestStorageGeneratedData::test_origin_list[1] <- swh/storage/tests/test_storage.py FAILED

================================================================================================================== FAILURES ===================================================================================================================
________________________________________________________________________________________________ TestStorageGeneratedData.test_origin_list[1] _________________________________________________________________________________________________

self = <swh.storage.tests.test_storage.TestStorageGeneratedData object at 0x7f60bb876c88>, swh_storage = <swh.storage.in_memory.InMemoryStorage object at 0x7f60bb876eb8>
swh_origins = [Origin(url='http://xn--n28h.tld/path/with/anchor#id=42'), Origin(url='http://example.com/path/with/qargs?q=1&b'), Ori...tld/path/to/resource'), Origin(url='deb://example.com/'), Origin(url='http://example.com/path/with/anchor#id=42'), ...]
limit = 1

    @pytest.mark.parametrize("limit", [1, 7, 10, 100, 1000])
    def test_origin_list(self, swh_storage, swh_origins, limit):
        returned_origins = []

        page_token = None
        i = 0
        while True:
            actual_page = swh_storage.origin_list(page_token=page_token, limit=limit)
            assert len(actual_page.results) <= limit

            returned_origins.extend(actual_page.results)

            i += 1
            page_token = actual_page.next_page_token

            if page_token is None:
                assert i * limit >= len(swh_origins)
                break
            else:
                assert len(actual_page.results) == limit

>       assert returned_origins == swh_origins
E       AssertionError: assert [Origin(url='deb://example.com'),\n Origin(url='deb://example.com/'),\n Origin(url='deb://example.com/path/to/resource'),\n Origin(url='deb://example.com/path/with/anchor#id=42'),\n Origin(url='deb://example.com/path/with/qargs?q=1&b'),\n Origin(url='deb://example.com/stuff'),\n Origin(url='deb://some.long.host.name'),\n Origin(url='deb://some.long.host.name/'),\n Origin(url='deb://some.long.host.name/path/to/resource'),\n Origin(url='deb://some.long.host.name/path/with/anchor#id=42'),\n Origin(url='deb://some.long.host.name/path/with/qargs?q=1&b'),\n Origin(url='deb://some.long.host.name/stuff/'),\n Origin(url='deb://xn--n28h.tld/'),\n Origin(url='deb://xn--n28h.tld/path/to/resource'),\n Origin(url='deb://xn--n28h.tld/path/with/qargs?q=1&b'),\n Origin(url='deb://xn--n28h.tld/stuff'),\n Origin(url='git://example.com'),\n Origin(url='git://example.com/path/to/resource'),\n Origin(url='git://example.com/path/with/anchor#id=42'),\n Origin(url='git://example.com/stuff'),\n Origin(url='git://example.com/stuff/'),\n Origin(url='git://some.long.host.name/'),\n Origin(url='git://some.long.host.name/path/to/resource'),\n Origin(url='git://some.long.host.name/path/with/anchor#id=42'),\n Origin(url='git://some.long.host.name/path/with/qargs?q=1&b'),\n Origin(url='git://some.long.host.name/stuff'),\n Origin(url='git://some.long.host.name/stuff/'),\n Origin(url='git://xn--n28h.tld'),\n Origin(url='git://xn--n28h.tld/'),\n Origin(url='git://xn--n28h.tld/path/with/anchor#id=42'),\n Origin(url='git://xn--n28h.tld/path/with/qargs?q=1&b'),\n Origin(url='git://xn--n28h.tld/stuff'),\n Origin(url='git://xn--n28h.tld/stuff/'),\n Origin(url='http://example.com'),\n Origin(url='http://example.com/path/to/resource'),\n Origin(url='http://example.com/path/with/anchor#id=42'),\n Origin(url='http://example.com/path/with/qargs?q=1&b'),\n Origin(url='http://example.com/stuff'),\n Origin(url='http://example.com/stuff/'),\n Origin(url='http://some.long.host.name/'),\n Origin(url='http://some.long.host.name/path/to/resource'),\n Origin(url='http://some.long.host.name/path/with/anchor#id=42'),\n Origin(url='http://some.long.host.name/path/with/qargs?q=1&b'),\n Origin(url='http://some.long.host.name/stuff'),\n Origin(url='http://some.long.host.name/stuff/'),\n Origin(url='http://xn--n28h.tld'),\n Origin(url='http://xn--n28h.tld/'),\n Origin(url='http://xn--n28h.tld/path/to/resource'),\n Origin(url='http://xn--n28h.tld/path/with/anchor#id=42'),\n Origin(url='http://xn--n28h.tld/path/with/qargs?q=1&b'),\n Origin(url='http://xn--n28h.tld/stuff'),\n Origin(url='http://xn--n28h.tld/stuff/'),\n Origin(url='https://example.com'),\n Origin(url='https://example.com/path/to/resource'),\n Origin(url='https://example.com/path/with/anchor#id=42'),\n Origin(url='https://example.com/path/with/qargs?q=1&b'),\n Origin(url='https://example.com/stuff'),\n Origin(url='https://example.com/stuff/'),\n Origin(url='https://some.long.host.name/path/to/resource'),\n Origin(url='https://some.long.host.name/path/with/anchor#id=42'),\n Origin(url='https://some.long.host.name/path/with/qargs?q=1&b'),\n Origin(url='https://some.long.host.name/stuff'),\n Origin(url='https://some.long.host.name/stuff/'),\n Origin(url='https://xn--n28h.tld'),\n Origin(url='https://xn--n28h.tld/'),\n Origin(url='https://xn--n28h.tld/path/to/resource'),\n Origin(url='https://xn--n28h.tld/path/with/anchor#id=42'),\n Origin(url='https://xn--n28h.tld/path/with/qargs?q=1&b'),\n Origin(url='https://xn--n28h.tld/stuff'),\n Origin(url='mock://example.com'),\n Origin(url='mock://example.com/'),\n Origin(url='mock://example.com/path/to/resource'),\n Origin(url='mock://example.com/path/with/qargs?q=1&b'),\n Origin(url='mock://example.com/stuff/'),\n Origin(url='mock://some.long.host.name'),\n Origin(url='mock://some.long.host.name/'),\n Origin(url='mock://some.long.host.name/path/to/resource'),\n Origin(url='mock://some.long.host.name/path/with/anchor#id=42'),\n Origin(url='mock://some.long.host.name/path/with/qargs?q=1&b'),\n Origin(url='mock://some.long.host.name/stuff'),\n Origin(url='mock://xn--n28h.tld'),\n Origin(url='mock://xn--n28h.tld/path/to/resource'),\n Origin(url='mock://xn--n28h.tld/path/with/anchor#id=42'),\n Origin(url='mock://xn--n28h.tld/stuff'),\n Origin(url='svn://example.com'),\n Origin(url='svn://example.com/'),\n Origin(url='svn://example.com/path/to/resource'),\n Origin(url='svn://example.com/path/with/anchor#id=42'),\n Origin(url='svn://example.com/path/with/qargs?q=1&b'),\n Origin(url='svn://example.com/stuff/'),\n Origin(url='svn://some.long.host.name/'),\n Origin(url='svn://some.long.host.name/path/to/resource'),\n Origin(url='svn://some.long.host.name/path/with/anchor#id=42'),\n Origin(url='svn://some.long.host.name/path/with/qargs?q=1&b'),\n Origin(url='svn://some.long.host.name/stuff'),\n Origin(url='svn://xn--n28h.tld/'),\n Origin(url='svn://xn--n28h.tld/path/with/anchor#id=42'),\n Origin(url='svn://xn--n28h.tld/path/with/qargs?q=1&b'),\n Origin(url='svn://xn--n28h.tld/stuff'),\n Origin(url='svn://xn--n28h.tld/stuff/')] == [Origin(url='http://xn--n28h.tld/path/with/anchor#id=42'),\n Origin(url='http://example.com/path/with/qargs?q=1&b'),\n Origin(url='https://some.long.host.name/path/with/anchor#id=42'),\n Origin(url='deb://xn--n28h.tld/path/to/resource'),\n Origin(url='deb://example.com/'),\n Origin(url='http://example.com/path/with/anchor#id=42'),\n Origin(url='https://some.long.host.name/stuff/'),\n Origin(url='deb://example.com'),\n Origin(url='http://some.long.host.name/stuff'),\n Origin(url='deb://some.long.host.name/path/to/resource'),\n Origin(url='svn://example.com/path/to/resource'),\n Origin(url='mock://example.com/'),\n Origin(url='git://xn--n28h.tld/path/with/anchor#id=42'),\n Origin(url='deb://some.long.host.name'),\n Origin(url='https://xn--n28h.tld/path/with/qargs?q=1&b'),\n Origin(url='svn://example.com/path/with/qargs?q=1&b'),\n Origin(url='git://example.com/stuff/'),\n Origin(url='deb://xn--n28h.tld/'),\n Origin(url='http://xn--n28h.tld/stuff/'),\n Origin(url='http://some.long.host.name/path/to/resource'),\n Origin(url='deb://example.com/path/to/resource'),\n Origin(url='mock://example.com/stuff/'),\n Origin(url='svn://some.long.host.name/path/with/qargs?q=1&b'),\n Origin(url='deb://example.com/path/with/anchor#id=42'),\n Origin(url='svn://xn--n28h.tld/'),\n Origin(url='svn://example.com'),\n Origin(url='https://some.long.host.name/path/with/qargs?q=1&b'),\n Origin(url='mock://xn--n28h.tld/path/to/resource'),\n Origin(url='git://some.long.host.name/'),\n Origin(url='mock://some.long.host.name/'),\n Origin(url='http://xn--n28h.tld'),\n Origin(url='svn://some.long.host.name/'),\n Origin(url='svn://some.long.host.name/path/with/anchor#id=42'),\n Origin(url='git://some.long.host.name/path/to/resource'),\n Origin(url='https://example.com/path/with/anchor#id=42'),\n Origin(url='git://example.com/stuff'),\n Origin(url='https://xn--n28h.tld/'),\n Origin(url='svn://example.com/path/with/anchor#id=42'),\n Origin(url='mock://some.long.host.name/stuff'),\n Origin(url='http://example.com'),\n Origin(url='deb://some.long.host.name/path/with/qargs?q=1&b'),\n Origin(url='deb://some.long.host.name/'),\n Origin(url='http://example.com/stuff'),\n Origin(url='svn://some.long.host.name/stuff'),\n Origin(url='svn://xn--n28h.tld/path/with/anchor#id=42'),\n Origin(url='http://xn--n28h.tld/stuff'),\n Origin(url='git://xn--n28h.tld/'),\n Origin(url='git://xn--n28h.tld/path/with/qargs?q=1&b'),\n Origin(url='http://some.long.host.name/stuff/'),\n Origin(url='https://xn--n28h.tld'),\n Origin(url='git://some.long.host.name/stuff/'),\n Origin(url='http://xn--n28h.tld/'),\n Origin(url='svn://xn--n28h.tld/stuff'),\n Origin(url='http://example.com/path/to/resource'),\n Origin(url='deb://xn--n28h.tld/path/with/qargs?q=1&b'),\n Origin(url='svn://some.long.host.name/path/to/resource'),\n Origin(url='deb://some.long.host.name/path/with/anchor#id=42'),\n Origin(url='svn://xn--n28h.tld/path/with/qargs?q=1&b'),\n Origin(url='svn://xn--n28h.tld/stuff/'),\n Origin(url='mock://some.long.host.name/path/with/qargs?q=1&b'),\n Origin(url='mock://some.long.host.name/path/to/resource'),\n Origin(url='deb://xn--n28h.tld/stuff'),\n Origin(url='https://some.long.host.name/path/to/resource'),\n Origin(url='mock://example.com'),\n Origin(url='mock://example.com/path/to/resource'),\n Origin(url='git://some.long.host.name/path/with/anchor#id=42'),\n Origin(url='https://example.com/stuff/'),\n Origin(url='https://example.com/path/to/resource'),\n Origin(url='mock://xn--n28h.tld/stuff'),\n Origin(url='deb://example.com/stuff'),\n Origin(url='https://xn--n28h.tld/path/with/anchor#id=42'),\n Origin(url='mock://xn--n28h.tld/path/with/anchor#id=42'),\n Origin(url='http://example.com/stuff/'),\n Origin(url='git://some.long.host.name/stuff'),\n Origin(url='http://xn--n28h.tld/path/with/qargs?q=1&b'),\n Origin(url='svn://example.com/stuff/'),\n Origin(url='git://example.com'),\n Origin(url='https://xn--n28h.tld/path/to/resource'),\n Origin(url='mock://xn--n28h.tld'),\n Origin(url='deb://some.long.host.name/stuff/'),\n Origin(url='mock://some.long.host.name'),\n Origin(url='https://example.com'),\n Origin(url='svn://example.com/'),\n Origin(url='deb://example.com/path/with/qargs?q=1&b'),\n Origin(url='https://example.com/stuff'),\n Origin(url='git://some.long.host.name/path/with/qargs?q=1&b'),\n Origin(url='git://example.com/path/with/anchor#id=42'),\n Origin(url='git://xn--n28h.tld/stuff/'),\n Origin(url='mock://example.com/path/with/qargs?q=1&b'),\n Origin(url='https://some.long.host.name/stuff'),\n Origin(url='https://example.com/path/with/qargs?q=1&b'),\n Origin(url='git://xn--n28h.tld'),\n Origin(url='git://xn--n28h.tld/stuff'),\n Origin(url='http://some.long.host.name/path/with/qargs?q=1&b'),\n Origin(url='http://xn--n28h.tld/path/to/resource'),\n Origin(url='http://some.long.host.name/'),\n Origin(url='mock://some.long.host.name/path/with/anchor#id=42'),\n Origin(url='http://some.long.host.name/path/with/anchor#id=42'),\n Origin(url='git://example.com/path/to/resource'),\n Origin(url='https://xn--n28h.tld/stuff')]
E         At index 0 diff: Origin(url='deb://example.com') != Origin(url='http://xn--n28h.tld/path/with/anchor#id=42')
E         Full diff:
E           [
E         +  Origin(url='deb://example.com'),
E         +  Origin(url='deb://example.com/'),
E         +  Origin(url='deb://example.com/path/to/resource'),
E         +  Origin(url='deb://example.com/path/with/anchor#id=42'),
E         +  Origin(url='deb://example.com/path/with/qargs?q=1&b'),
E         +  Origin(url='deb://example.com/stuff'),
E         +  Origin(url='deb://some.long.host.name'),
E         +  Origin(url='deb://some.long.host.name/'),
E         +  Origin(url='deb://some.long.host.name/path/to/resource'),
E         +  Origin(url='deb://some.long.host.name/path/with/anchor#id=42'),
E         +  Origin(url='deb://some.long.host.name/path/with/qargs?q=1&b'),
E         +  Origin(url='deb://some.long.host.name/stuff/'),
E         +  Origin(url='deb://xn--n28h.tld/'),
E         +  Origin(url='deb://xn--n28h.tld/path/to/resource'),
E         +  Origin(url='deb://xn--n28h.tld/path/with/qargs?q=1&b'),
E         +  Origin(url='deb://xn--n28h.tld/stuff'),
E         +  Origin(url='git://example.com'),
E         +  Origin(url='git://example.com/path/to/resource'),
E         +  Origin(url='git://example.com/path/with/anchor#id=42'),
E         +  Origin(url='git://example.com/stuff'),
E         +  Origin(url='git://example.com/stuff/'),
E         +  Origin(url='git://some.long.host.name/'),
E         +  Origin(url='git://some.long.host.name/path/to/resource'),
E         +  Origin(url='git://some.long.host.name/path/with/anchor#id=42'),
E         +  Origin(url='git://some.long.host.name/path/with/qargs?q=1&b'),
E         +  Origin(url='git://some.long.host.name/stuff'),
E         +  Origin(url='git://some.long.host.name/stuff/'),
E         +  Origin(url='git://xn--n28h.tld'),
E         +  Origin(url='git://xn--n28h.tld/'),
E         +  Origin(url='git://xn--n28h.tld/path/with/anchor#id=42'),
E         +  Origin(url='git://xn--n28h.tld/path/with/qargs?q=1&b'),
E         +  Origin(url='git://xn--n28h.tld/stuff'),
E         +  Origin(url='git://xn--n28h.tld/stuff/'),
E         +  Origin(url='http://example.com'),
E         +  Origin(url='http://example.com/path/to/resource'),
E         +  Origin(url='http://example.com/path/with/anchor#id=42'),
E         +  Origin(url='http://example.com/path/with/qargs?q=1&b'),
E         +  Origin(url='http://example.com/stuff'),
E         +  Origin(url='http://example.com/stuff/'),
E         +  Origin(url='http://some.long.host.name/'),
E         +  Origin(url='http://some.long.host.name/path/to/resource'),
E         +  Origin(url='http://some.long.host.name/path/with/anchor#id=42'),
E         +  Origin(url='http://some.long.host.name/path/with/qargs?q=1&b'),
E         +  Origin(url='http://some.long.host.name/stuff'),
E         +  Origin(url='http://some.long.host.name/stuff/'),
E         +  Origin(url='http://xn--n28h.tld'),
E         +  Origin(url='http://xn--n28h.tld/'),
E         +  Origin(url='http://xn--n28h.tld/path/to/resource'),
E            Origin(url='http://xn--n28h.tld/path/with/anchor#id=42'),
E         +  Origin(url='http://xn--n28h.tld/path/with/qargs?q=1&b'),
E         +  Origin(url='http://xn--n28h.tld/stuff'),
E         +  Origin(url='http://xn--n28h.tld/stuff/'),
E         +  Origin(url='https://example.com'),
E         +  Origin(url='https://example.com/path/to/resource'),
E         +  Origin(url='https://example.com/path/with/anchor#id=42'),
E         -  Origin(url='http://example.com/path/with/qargs?q=1&b'),
E         +  Origin(url='https://example.com/path/with/qargs?q=1&b'),
E         ?                  +
E         +  Origin(url='https://example.com/stuff'),
E         +  Origin(url='https://example.com/stuff/'),
E         +  Origin(url='https://some.long.host.name/path/to/resource'),
E            Origin(url='https://some.long.host.name/path/with/anchor#id=42'),
E         +  Origin(url='https://some.long.host.name/path/with/qargs?q=1&b'),
E         +  Origin(url='https://some.long.host.name/stuff'),
E         -  Origin(url='deb://xn--n28h.tld/path/to/resource'),
E         -  Origin(url='deb://example.com/'),
E         -  Origin(url='http://example.com/path/with/anchor#id=42'),
E            Origin(url='https://some.long.host.name/stuff/'),
E         +  Origin(url='https://xn--n28h.tld'),
E         +  Origin(url='https://xn--n28h.tld/'),
E         +  Origin(url='https://xn--n28h.tld/path/to/resource'),
E         +  Origin(url='https://xn--n28h.tld/path/with/anchor#id=42'),
E         +  Origin(url='https://xn--n28h.tld/path/with/qargs?q=1&b'),
E         +  Origin(url='https://xn--n28h.tld/stuff'),
E         -  Origin(url='deb://example.com'),
E         ?              ^^^
E         +  Origin(url='mock://example.com'),
E         ?              ^^^^
E         +  Origin(url='mock://example.com/'),
E         +  Origin(url='mock://example.com/path/to/resource'),
E         +  Origin(url='mock://example.com/path/with/qargs?q=1&b'),
E         +  Origin(url='mock://example.com/stuff/'),
E         +  Origin(url='mock://some.long.host.name'),
E         -  Origin(url='http://some.long.host.name/stuff'),
E         ?              ^^^^                       -----
E         +  Origin(url='mock://some.long.host.name/'),
E         ?              ^^^^
E         -  Origin(url='deb://some.long.host.name/path/to/resource'),
E         ?              ^^^
E         +  Origin(url='mock://some.long.host.name/path/to/resource'),
E         ?              ^^^^
E         +  Origin(url='mock://some.long.host.name/path/with/anchor#id=42'),
E         +  Origin(url='mock://some.long.host.name/path/with/qargs?q=1&b'),
E         +  Origin(url='mock://some.long.host.name/stuff'),
E         +  Origin(url='mock://xn--n28h.tld'),
E         +  Origin(url='mock://xn--n28h.tld/path/to/resource'),
E         +  Origin(url='mock://xn--n28h.tld/path/with/anchor#id=42'),
E         +  Origin(url='mock://xn--n28h.tld/stuff'),
E         +  Origin(url='svn://example.com'),
E         +  Origin(url='svn://example.com/'),
E            Origin(url='svn://example.com/path/to/resource'),
E         -  Origin(url='mock://example.com/'),
E         -  Origin(url='git://xn--n28h.tld/path/with/anchor#id=42'),
E         ?              ^^^    ^^^^^^^ ^^^
E         +  Origin(url='svn://example.com/path/with/anchor#id=42'),
E         ?              ^^^   + ^^^^^ ^^^
E         -  Origin(url='deb://some.long.host.name'),
E         -  Origin(url='https://xn--n28h.tld/path/with/qargs?q=1&b'),
E            Origin(url='svn://example.com/path/with/qargs?q=1&b'),
E         -  Origin(url='git://example.com/stuff/'),
E         ?              ^^^
E         +  Origin(url='svn://example.com/stuff/'),
E         ?              ^^^
E         +  Origin(url='svn://some.long.host.name/'),
E         -  Origin(url='deb://xn--n28h.tld/'),
E         -  Origin(url='http://xn--n28h.tld/stuff/'),
E         -  Origin(url='http://some.long.host.name/path/to/resource'),
E         ?              ^^^^
E         +  Origin(url='svn://some.long.host.name/path/to/resource'),
E         ?              ^^^
E         +  Origin(url='svn://some.long.host.name/path/with/anchor#id=42'),
E         -  Origin(url='deb://example.com/path/to/resource'),
E         -  Origin(url='mock://example.com/stuff/'),
E            Origin(url='svn://some.long.host.name/path/with/qargs?q=1&b'),
E         -  Origin(url='deb://example.com/path/with/anchor#id=42'),
E         +  Origin(url='svn://some.long.host.name/stuff'),
E            Origin(url='svn://xn--n28h.tld/'),
E         -  Origin(url='svn://example.com'),
E         -  Origin(url='https://some.long.host.name/path/with/qargs?q=1&b'),
E         -  Origin(url='mock://xn--n28h.tld/path/to/resource'),
E         -  Origin(url='git://some.long.host.name/'),
E         -  Origin(url='mock://some.long.host.name/'),
E         -  Origin(url='http://xn--n28h.tld'),
E         -  Origin(url='svn://some.long.host.name/'),
E         -  Origin(url='svn://some.long.host.name/path/with/anchor#id=42'),
E         -  Origin(url='git://some.long.host.name/path/to/resource'),
E         -  Origin(url='https://example.com/path/with/anchor#id=42'),
E         -  Origin(url='git://example.com/stuff'),
E         -  Origin(url='https://xn--n28h.tld/'),
E         -  Origin(url='svn://example.com/path/with/anchor#id=42'),
E         -  Origin(url='mock://some.long.host.name/stuff'),
E         -  Origin(url='http://example.com'),
E         -  Origin(url='deb://some.long.host.name/path/with/qargs?q=1&b'),
E         -  Origin(url='deb://some.long.host.name/'),
E         -  Origin(url='http://example.com/stuff'),
E         -  Origin(url='svn://some.long.host.name/stuff'),
E            Origin(url='svn://xn--n28h.tld/path/with/anchor#id=42'),
E         -  Origin(url='http://xn--n28h.tld/stuff'),
E         -  Origin(url='git://xn--n28h.tld/'),
E         -  Origin(url='git://xn--n28h.tld/path/with/qargs?q=1&b'),
E         ?              ^^^
E         +  Origin(url='svn://xn--n28h.tld/path/with/qargs?q=1&b'),
E         ?              ^^^
E         -  Origin(url='http://some.long.host.name/stuff/'),
E         -  Origin(url='https://xn--n28h.tld'),
E         -  Origin(url='git://some.long.host.name/stuff/'),
E         -  Origin(url='http://xn--n28h.tld/'),
E            Origin(url='svn://xn--n28h.tld/stuff'),
E         -  Origin(url='http://example.com/path/to/resource'),
E         -  Origin(url='deb://xn--n28h.tld/path/with/qargs?q=1&b'),
E         -  Origin(url='svn://some.long.host.name/path/to/resource'),
E         -  Origin(url='deb://some.long.host.name/path/with/anchor#id=42'),
E         -  Origin(url='svn://xn--n28h.tld/path/with/qargs?q=1&b'),
E            Origin(url='svn://xn--n28h.tld/stuff/'),
E         -  Origin(url='mock://some.long.host.name/path/with/qargs?q=1&b'),
E         -  Origin(url='mock://some.long.host.name/path/to/resource'),
E         -  Origin(url='deb://xn--n28h.tld/stuff'),
E         -  Origin(url='https://some.long.host.name/path/to/resource'),
E         -  Origin(url='mock://example.com'),
E         -  Origin(url='mock://example.com/path/to/resource'),
E         -  Origin(url='git://some.long.host.name/path/with/anchor#id=42'),
E         -  Origin(url='https://example.com/stuff/'),
E         -  Origin(url='https://example.com/path/to/resource'),
E         -  Origin(url='mock://xn--n28h.tld/stuff'),
E         -  Origin(url='deb://example.com/stuff'),
E         -  Origin(url='https://xn--n28h.tld/path/with/anchor#id=42'),
E         -  Origin(url='mock://xn--n28h.tld/path/with/anchor#id=42'),
E         -  Origin(url='http://example.com/stuff/'),
E         -  Origin(url='git://some.long.host.name/stuff'),
E         -  Origin(url='http://xn--n28h.tld/path/with/qargs?q=1&b'),
E         -  Origin(url='svn://example.com/stuff/'),
E         -  Origin(url='git://example.com'),
E         -  Origin(url='https://xn--n28h.tld/path/to/resource'),
E         -  Origin(url='mock://xn--n28h.tld'),
E         -  Origin(url='deb://some.long.host.name/stuff/'),
E         -  Origin(url='mock://some.long.host.name'),
E         -  Origin(url='https://example.com'),
E         -  Origin(url='svn://example.com/'),
E         -  Origin(url='deb://example.com/path/with/qargs?q=1&b'),
E         -  Origin(url='https://example.com/stuff'),
E         -  Origin(url='git://some.long.host.name/path/with/qargs?q=1&b'),
E         -  Origin(url='git://example.com/path/with/anchor#id=42'),
E         -  Origin(url='git://xn--n28h.tld/stuff/'),
E         -  Origin(url='mock://example.com/path/with/qargs?q=1&b'),
E         -  Origin(url='https://some.long.host.name/stuff'),
E         -  Origin(url='https://example.com/path/with/qargs?q=1&b'),
E         -  Origin(url='git://xn--n28h.tld'),
E         -  Origin(url='git://xn--n28h.tld/stuff'),
E         -  Origin(url='http://some.long.host.name/path/with/qargs?q=1&b'),
E         -  Origin(url='http://xn--n28h.tld/path/to/resource'),
E         -  Origin(url='http://some.long.host.name/'),
E         -  Origin(url='mock://some.long.host.name/path/with/anchor#id=42'),
E         -  Origin(url='http://some.long.host.name/path/with/anchor#id=42'),
E         -  Origin(url='git://example.com/path/to/resource'),
E         -  Origin(url='https://xn--n28h.tld/stuff'),
E           ]

swh/storage/tests/test_storage.py:3992: AssertionError
=========================================================================================================== short test summary info ===========================================================================================================
FAILED swh/storage/tests/test_in_memory.py::TestStorageGeneratedData::test_origin_list[1] - AssertionError: assert [Origin(url='deb://example.com'),\n Origin(url='deb://example.com/'),\n Origin(url='deb://example.com/path/to/resource'),...
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
====================================================================================================== 1 failed, 159 deselected in 0.79s ======================================================================================================

Adapt test_origin_list so the issue is more apparent

Build has FAILED

Patch application report for D3683 (id=12970)

Could not rebase; Attempt merge onto 92f1183de0...

Updating 92f1183d..20e3e15d
Fast-forward
 swh/storage/algos/origin.py            | 32 ++++---------
 swh/storage/cassandra/storage.py       | 12 ++++-
 swh/storage/in_memory.py               | 45 ++++++++++--------
 swh/storage/interface.py               | 30 +++++-------
 swh/storage/storage.py                 | 14 +++++-
 swh/storage/tests/algos/test_origin.py | 84 +++++-----------------------------
 swh/storage/tests/test_cassandra.py    |  8 ----
 swh/storage/tests/test_storage.py      | 65 ++++++++++----------------
 8 files changed, 104 insertions(+), 186 deletions(-)
Changes applied before test
commit 20e3e15d72a2508c3535dc21b2c94470e49a8ab9
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Aug 1 11:15:56 2020 +0200

    storage.in_memory: Fix origin_list implementation
    
    This kept on creating unrelated errors on other modules.
    
    Related to T645

commit 77757a0dccb429bd84154df11ad58c8442d75b79
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Jul 31 17:56:13 2020 +0200

    storage*: Drop origin-get-range in favor of origin-list
    
    Related to T645

commit 87c5ba223508b212ed6939496a6380a771b05e43
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Aug 1 09:07:12 2020 +0200

    storage*: Do not allow unknown visit status in origin_visit*_get_latest
    
    That makes some storage (pg-storage) fail without that filtering.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/650/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/650/console

then that's not a bug. The order of results of origin_list is not guaranteed to be the order of insertion. Providing this property is equivalent to having numeric origin ids.

then that's not a bug. The order of results of origin_list is not guaranteed to be the order of insertion.

yeah, you are right. I'm slowly realizing this.
I gather the cassandra storage agrees with you and thus fails
(I did not check but believe that's the source of the current failure... yeah it is).

Although now, I don't really know what i can check in the scheduler cli D3682.
Only that the number of origins is correct then, i guess.

Providing this property is equivalent to having numeric origin ids.

yeah, i probably made a shortcut there ;)