Page MenuHomeSoftware Heritage

storage*: Add origin-visit-status-get-latest endpoint
ClosedPublic

Authored by ardumont on Jun 16 2020, 8:15 PM.

Details

Summary

So we can read the latest origin-visit-status out of storage.

This is preparatory work to remove the OriginVisit's now unneeded fields. First
migrate storage api clients to use origin-visit-status-get-latest.

Related to T2310

Test Plan

tox

Diff Detail

Repository
rDSTO Storage manager
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 12897
Build 19636: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 19635: arc lint + arc unit

Unit TestsFailed

TimeTest
850 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_cassandra.TestCassandraStorage::test_origin_visit_add
self = <swh.storage.tests.test_cassandra.TestCassandraStorage object at 0x7f36bc3a7a90> swh_storage = <swh.storage.validate.ValidatingProxyStorage object at 0x7f369c22e320>
725 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_cassandra.TestCassandraStorage::test_origin_visit_find_by_date
self = <swh.storage.tests.test_cassandra.TestCassandraStorage object at 0x7f367c57a6d8> swh_storage = <swh.storage.validate.ValidatingProxyStorage object at 0x7f367c79d908>
846 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_cassandra.TestCassandraStorage::test_origin_visit_get_by
self = <swh.storage.tests.test_cassandra.TestCassandraStorage object at 0x7f37813bf518> swh_storage = <swh.storage.validate.ValidatingProxyStorage object at 0x7f369c377780>
718 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_cassandra.TestCassandraStorage::test_origin_visit_get_latest
self = <swh.storage.tests.test_cassandra.TestCassandraStorage object at 0x7f367c1a7470> swh_storage = <swh.storage.validate.ValidatingProxyStorage object at 0x7f367c561438>
5,310 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_cassandra.TestCassandraStorage::test_origin_visit_get_random
self = <swh.storage.tests.test_cassandra.TestCassandraStorage object at 0x7f36bc041da0> swh_storage = <swh.storage.validate.ValidatingProxyStorage object at 0x7f369c51f048>
View Full Test Results (14 Failed · 726 Passed · 17 Skipped)

Event Timeline

Build is green

Patch application report for D3296 (id=11678)

Rebasing onto d153a8096d...

Current branch diff-target is up to date.
Changes applied before test
commit a6c9583bbe18049741f158bb2471f1deecf7258a
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jun 16 18:03:23 2020 +0200

    storage*: Add origin-visit-status-get-latest endpoint
    
    So we can read the latest origin-visit-status out of a storage
    
    Related to T2310

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/287/ for more details.

vlorentz added inline comments.
swh/storage/cassandra/cql.py
729–730

make this a prepared statement

736–739

move this to cassandra/storage.py; cassandra/cql.py should only be a thin abstraction over the queries.

swh/storage/interface.py
912–915

it should list all the possible statuses.

swh/storage/validate.py
109–120 ↗(On Diff #11678)

unrelated to this diff

This revision now requires changes to proceed.Jun 16 2020, 8:37 PM
swh/storage/cassandra/cql.py
729–730

ah yeah, indeed, we can now. I iterated a bit prior to find something that works.

736–739

I keep it here so the implementation and name are in adequation.
Doing what you ask, I need to change the name of the method then.

swh/storage/validate.py
109–120 ↗(On Diff #11678)

yes, it was to avoid another diff with no apparent test.
That part is about the same motivation as D3277 (i'll open another diff then).

swh/storage/validate.py
109–120 ↗(On Diff #11678)

open another diff then

D3298

Adapt according to review:

  • Update docstring to list all possible status values
  • Move filtering logic python side within cassandra/storage module (and not cql)

Build has FAILED

Patch application report for D3296 (id=11688)

Could not rebase; Attempt merge onto d153a8096d...

Updating d153a80..a89c5f7
Fast-forward
 swh/storage/cassandra/converters.py |  29 ++++++--
 swh/storage/cassandra/cql.py        |  39 ++++-------
 swh/storage/cassandra/storage.py    |  20 ++++++
 swh/storage/db.py                   |  51 +++++++++++---
 swh/storage/in_memory.py            |  24 +++++++
 swh/storage/interface.py            |  30 +++++++-
 swh/storage/storage.py              |  18 +++++
 swh/storage/tests/test_storage.py   | 132 ++++++++++++++++++++++++++++++++++--
 swh/storage/validate.py             |  16 +++--
 9 files changed, 306 insertions(+), 53 deletions(-)
Changes applied before test
commit a89c5f7cfa3bcd055c16361e6ce8f5edf3c213f3
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jun 16 18:03:23 2020 +0200

    storage*: Add origin-visit-status-get-latest endpoint
    
    So we can read the latest origin-visit-status out of a storage
    
    Related to T2310

commit 1c615f7d31ffeebb07a05ff78116950fb51ebaa5
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 17 09:33:21 2020 +0200

    validate.snapshot_add: Make it able to deal directly with Snapshot
    
    Related to D3277

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/290/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/290/console

  • Re-add the origin-visit-status-get-latest endpoint which got removed by mistake (in effect fixing the build failure)
  • This also moves the origin-visit-status convertion from the cql module to the converters module (to keep the cql a thin wrapper around cassandra queries as initially intended)

Build is green

Patch application report for D3296 (id=11691)

Could not rebase; Attempt merge onto d153a8096d...

Updating d153a80..880cbfe
Fast-forward
 swh/storage/cassandra/converters.py |  25 ++++++-
 swh/storage/cassandra/cql.py        |  42 +++++-------
 swh/storage/cassandra/storage.py    |  30 ++++++--
 swh/storage/db.py                   |  51 +++++++++++---
 swh/storage/in_memory.py            |  24 +++++++
 swh/storage/interface.py            |  30 +++++++-
 swh/storage/storage.py              |  18 +++++
 swh/storage/tests/test_storage.py   | 132 ++++++++++++++++++++++++++++++++++--
 swh/storage/validate.py             |  16 +++--
 9 files changed, 313 insertions(+), 55 deletions(-)
Changes applied before test
commit 880cbfe6adb062a7347dc99d96651d3374093941
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jun 16 18:03:23 2020 +0200

    storage*: Add origin-visit-status-get-latest endpoint
    
    So we can read the latest origin-visit-status out of a storage
    
    Related to T2310

commit 1c615f7d31ffeebb07a05ff78116950fb51ebaa5
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 17 09:33:21 2020 +0200

    validate.snapshot_add: Make it able to deal directly with Snapshot
    
    Related to D3277

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/292/ for more details.

Build is green

Patch application report for D3296 (id=11696)

Rebasing onto 057c6fd5df...

Current branch diff-target is up to date.
Changes applied before test
commit d9425a6ef33a99db06c0205dd83ad89374939e29
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jun 16 18:03:23 2020 +0200

    storage*: Add origin-visit-status-get-latest endpoint
    
    So we can read the latest origin-visit-status out of a storage
    
    Related to T2310

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/296/ for more details.

vlorentz added inline comments.
swh/storage/cassandra/cql.py
726

statuses

This revision is now accepted and ready to land.Jun 17 2020, 12:05 PM
  • Rebase on latest master
  • Fix a docstring typo

Build is green

Patch application report for D3296 (id=11700)

Rebasing onto 692bfa3944...

Current branch diff-target is up to date.
Changes applied before test
commit 731949586ca186b52d046c02ff4bb87b48b9adb3
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jun 16 18:03:23 2020 +0200

    storage*: Add origin-visit-status-get-latest endpoint
    
    So we can read the latest origin-visit-status out of a storage
    
    Related to T2310

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/298/ for more details.