Page MenuHomeSoftware Heritage

storage*: Drop origin-get-range in favor of origin-list
ClosedPublic

Authored by ardumont on Aug 1 2020, 9:30 AM.

Details

Summary

Remaining impact is swh-scheduler [1]:

$ grep -r 'iter_origins' $SWH_ENVIRONMENT_HOME/*/swh/* | grep -v "swh-storage"
Binary file /home/tony/work/inria/repo/swh/swh-environment/swh-scheduler/swh/scheduler/cli/__pycache__/task.cpython-37.pyc matches
Binary file /home/tony/work/inria/repo/swh/swh-environment/swh-scheduler/swh/scheduler/cli/__pycache__/task.cpython-37-pytest-6.0.1.pyc matches
/home/tony/work/inria/repo/swh/swh-environment/swh-scheduler/swh/scheduler/cli/task.py:    from swh.storage.algos.origin import iter_origins
/home/tony/work/inria/repo/swh/swh-environment/swh-scheduler/swh/scheduler/cli/task.py:    origins = iter_origins(storage, origin_from=min_id, origin_to=max_id)

[1] swh-web already dealt with in D3675

Related to T645

Depends on D3680

Diff Detail

Repository
rDSTO Storage manager
Branch
drop-origin-get-range
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 14209
Build 21844: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 21843: arc lint + arc unit

Event Timeline

Build has FAILED

Patch application report for D3681 (id=12961)

Could not rebase; Attempt merge onto 92f1183de0...

Updating 92f1183d..89e89239
Fast-forward
 swh/storage/algos/origin.py            | 30 ++++---------
 swh/storage/cassandra/storage.py       | 12 +++++-
 swh/storage/in_memory.py               | 24 ++++++-----
 swh/storage/interface.py               | 30 ++++++-------
 swh/storage/storage.py                 | 14 +++++-
 swh/storage/tests/algos/test_origin.py | 79 +++++-----------------------------
 swh/storage/tests/test_cassandra.py    |  4 --
 swh/storage/tests/test_storage.py      | 62 +++++++++-----------------
 8 files changed, 90 insertions(+), 165 deletions(-)
Changes applied before test
commit 89e892393ddbbf0fb34f82f93c30815f6c1a9fc5
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Jul 31 17:56:13 2020 +0200

    storage*: Drop origin-get-range in favor of origin-list
    
    Related to T645

commit 87c5ba223508b212ed6939496a6380a771b05e43
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Aug 1 09:07:12 2020 +0200

    storage*: Do not allow unknown visit status in origin_visit*_get_latest
    
    That makes some storage (pg-storage) fail without that filtering.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/644/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/644/console

Build has FAILED

Patch application report for D3681 (id=12961)

Could not rebase; Attempt merge onto 92f1183de0...

Updating 92f1183d..89e89239
Fast-forward
 swh/storage/algos/origin.py            | 30 ++++---------
 swh/storage/cassandra/storage.py       | 12 +++++-
 swh/storage/in_memory.py               | 24 ++++++-----
 swh/storage/interface.py               | 30 ++++++-------
 swh/storage/storage.py                 | 14 +++++-
 swh/storage/tests/algos/test_origin.py | 79 +++++-----------------------------
 swh/storage/tests/test_cassandra.py    |  4 --
 swh/storage/tests/test_storage.py      | 62 +++++++++-----------------
 8 files changed, 90 insertions(+), 165 deletions(-)
Changes applied before test
commit 89e892393ddbbf0fb34f82f93c30815f6c1a9fc5
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Jul 31 17:56:13 2020 +0200

    storage*: Drop origin-get-range in favor of origin-list
    
    Related to T645

commit 87c5ba223508b212ed6939496a6380a771b05e43
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Aug 1 09:07:12 2020 +0200

    storage*: Do not allow unknown visit status in origin_visit*_get_latest
    
    That makes some storage (pg-storage) fail without that filtering.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/645/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/645/console

Fix flake8 issue about unused import (tests are fine otherwise)

Build is green

Patch application report for D3681 (id=12962)

Could not rebase; Attempt merge onto 92f1183de0...

Updating 92f1183d..ad8ab5bd
Fast-forward
 swh/storage/algos/origin.py            | 30 ++++---------
 swh/storage/cassandra/storage.py       | 12 ++++-
 swh/storage/in_memory.py               | 24 +++++-----
 swh/storage/interface.py               | 30 +++++--------
 swh/storage/storage.py                 | 14 +++++-
 swh/storage/tests/algos/test_origin.py | 80 +++++-----------------------------
 swh/storage/tests/test_cassandra.py    |  4 --
 swh/storage/tests/test_storage.py      | 62 +++++++++-----------------
 8 files changed, 90 insertions(+), 166 deletions(-)
Changes applied before test
commit ad8ab5bd69ce85aa453cb920876f6e4f46fdd187
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Jul 31 17:56:13 2020 +0200

    storage*: Drop origin-get-range in favor of origin-list
    
    Related to T645

commit 87c5ba223508b212ed6939496a6380a771b05e43
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Aug 1 09:07:12 2020 +0200

    storage*: Do not allow unknown visit status in origin_visit*_get_latest
    
    That makes some storage (pg-storage) fail without that filtering.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/646/ for more details.

Simplify:

  • Drop missing skipped test instruction for cassandra
  • simplify test_origin assertions
  • Drop unneeded instruction in iter_origins function definition

Build is green

Patch application report for D3681 (id=12963)

Could not rebase; Attempt merge onto 92f1183de0...

Updating 92f1183d..77757a0d
Fast-forward
 swh/storage/algos/origin.py            | 32 ++++---------
 swh/storage/cassandra/storage.py       | 12 ++++-
 swh/storage/in_memory.py               | 24 +++++-----
 swh/storage/interface.py               | 30 +++++-------
 swh/storage/storage.py                 | 14 +++++-
 swh/storage/tests/algos/test_origin.py | 84 +++++-----------------------------
 swh/storage/tests/test_cassandra.py    |  8 ----
 swh/storage/tests/test_storage.py      | 62 +++++++++----------------
 8 files changed, 90 insertions(+), 176 deletions(-)
Changes applied before test
commit 77757a0dccb429bd84154df11ad58c8442d75b79
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Jul 31 17:56:13 2020 +0200

    storage*: Drop origin-get-range in favor of origin-list
    
    Related to T645

commit 87c5ba223508b212ed6939496a6380a771b05e43
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Aug 1 09:07:12 2020 +0200

    storage*: Do not allow unknown visit status in origin_visit*_get_latest
    
    That makes some storage (pg-storage) fail without that filtering.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/647/ for more details.

in-memory: Drop no longer needed self._origins_by_id state variable

Build is green

Patch application report for D3681 (id=12971)

Could not rebase; Attempt merge onto 92f1183de0...

Updating 92f1183d..79b3f100
Fast-forward
 swh/storage/algos/origin.py            | 32 ++++---------
 swh/storage/cassandra/storage.py       | 12 ++++-
 swh/storage/in_memory.py               | 31 ++++++-------
 swh/storage/interface.py               | 30 +++++-------
 swh/storage/storage.py                 | 14 +++++-
 swh/storage/tests/algos/test_origin.py | 84 +++++-----------------------------
 swh/storage/tests/test_cassandra.py    |  8 ----
 swh/storage/tests/test_storage.py      | 62 +++++++++----------------
 8 files changed, 90 insertions(+), 183 deletions(-)
Changes applied before test
commit 79b3f1001186fbaa40ba3073e5e914be7e56eb9f
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Jul 31 17:56:13 2020 +0200

    storage*: Drop origin-get-range in favor of origin-list
    
    Related to T645

commit 87c5ba223508b212ed6939496a6380a771b05e43
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Aug 1 09:07:12 2020 +0200

    storage*: Do not allow unknown visit status in origin_visit*_get_latest
    
    That makes some storage (pg-storage) fail without that filtering.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/651/ for more details.

vlorentz added inline comments.
swh/storage/algos/origin.py
27–32

use stream_results

vlorentz requested changes to this revision.Aug 3 2020, 9:11 AM
This revision now requires changes to proceed.Aug 3 2020, 9:11 AM
swh/storage/algos/origin.py
27–32

I cannot, can I?

D3682 needs it. That's why I opened this like this at least.

swh/storage/algos/origin.py
27–32

Also, as mentioned in the scheduler diff, we can add here the logger.info about the page_token (or in stream_results directly). I don't really see which is best?

Right now, it's needed for the origins so i gather here is enough.

swh/storage/algos/origin.py
27–32

I think we should impelement iter_origins directly in the scheduler CLI as it has a rather specific need

swh/storage/algos/origin.py
27–32

Ok, so here, just plain stream_results call as the origin-visit and status in the same module at the end.
And then inline this code in the scheduler diff instead

sounds like a plan to me.
If the needs becomes shareable, we'll adapt then.

Thanks for your input ;)

Adapt according to discussion

Build has FAILED

Patch application report for D3681 (id=12984)

Rebasing onto 87c5ba2235...

Current branch diff-target is up to date.
Changes applied before test
commit aa58e1092ffb46f807cd232be5c25cefe01b6967
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Jul 31 17:56:13 2020 +0200

    storage*: Drop origin-get-range in favor of origin-list
    
    Related to T645

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/652/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/652/console

swh/storage/algos/origin.py
27–32

👍

This revision is now accepted and ready to land.Aug 3 2020, 11:47 AM

Build is green

Patch application report for D3681 (id=12984)

Rebasing onto 87c5ba2235...

Current branch diff-target is up to date.
Changes applied before test
commit aa58e1092ffb46f807cd232be5c25cefe01b6967
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Jul 31 17:56:13 2020 +0200

    storage*: Drop origin-get-range in favor of origin-list
    
    Related to T645

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/653/ for more details.