Page MenuHomeSoftware Heritage

algos.snapshot: Open snapshot_get_from_revision algorithm
ClosedPublic

Authored by ardumont on Thu, Jul 30, 10:02 AM.

Details

Summary

This leverages the latest change in origin_visit_get and
origin_visit_status_get (paginated endpoints now) to iterate over visits and
visit statuses.
The order iterated over is from most recent to oldest to try and resolve a snapshot targetting
a specific revision.

Note:
This algo got used recently in the deposit. It may serve again.

Depends on D3641
Related to D3645

Related to T645

Test Plan

tox

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ardumont created this revision.Thu, Jul 30, 10:02 AM

Build has FAILED

Patch application report for D3648 (id=12839)

Could not rebase; Attempt merge onto 7667f7eef9...

Updating 7667f7ee..2b6635d6
Fast-forward
 swh/storage/algos/snapshot.py            |  55 ++++++++-
 swh/storage/cassandra/cql.py             |  74 +++++++++++
 swh/storage/cassandra/storage.py         |  34 +++++
 swh/storage/db.py                        |  53 ++++----
 swh/storage/in_memory.py                 |  40 ++++++
 swh/storage/interface.py                 |  28 +++++
 swh/storage/storage.py                   |  47 +++++++
 swh/storage/tests/algos/test_snapshot.py |  64 +++++++++-
 swh/storage/tests/test_storage.py        | 205 +++++++++++++++++++++++++++++++
 9 files changed, 569 insertions(+), 31 deletions(-)
Changes applied before test
commit 2b6635d6304c26bcb5f2a7f494caa443399d92f6
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Jul 30 09:58:40 2020 +0200

    algos.snapshot: Open snapshot_get_from_revision
    
    This leverages the latest change in origin_visit_get and
    origin_visit_status_get to iterate over visit and visit status to detect a
    snapshot targetting a revision.
    
    This algo got used recently in the deposit. It may serve again.
    
    Related to T645

commit de11ef60ea74474c16df45149a97e9cfd8244ff0
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jul 29 16:33:31 2020 +0200

    storage*: add origin_visit_status_get(...) -> PagedResult[OriginVisitStatus]
    
    Related to T645

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/613/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/613/console

vlorentz requested changes to this revision.Thu, Jul 30, 10:14 AM
vlorentz added a subscriber: vlorentz.

That's a big function that paginates through two types of objects and iterates through a third type. I think we should break it up, see below

swh/storage/algos/snapshot.py
116–122

move this to a new iter_origin_visit generator

123–131

move this to a new iter_origin_visit_statuses generator

137

use snapshot_get_all_branches

This revision now requires changes to proceed.Thu, Jul 30, 10:14 AM
ardumont updated this revision to Diff 12840.Thu, Jul 30, 10:21 AM

Update missing branch None case check

and please mention in the docstring it can be a very costly operation

Build is green

Patch application report for D3648 (id=12840)

Could not rebase; Attempt merge onto 7667f7eef9...

Updating 7667f7ee..e026d33d
Fast-forward
 swh/storage/algos/snapshot.py            |  56 ++++++++-
 swh/storage/cassandra/cql.py             |  74 +++++++++++
 swh/storage/cassandra/storage.py         |  34 +++++
 swh/storage/db.py                        |  53 ++++----
 swh/storage/in_memory.py                 |  40 ++++++
 swh/storage/interface.py                 |  28 +++++
 swh/storage/storage.py                   |  47 +++++++
 swh/storage/tests/algos/test_snapshot.py |  64 +++++++++-
 swh/storage/tests/test_storage.py        | 205 +++++++++++++++++++++++++++++++
 9 files changed, 570 insertions(+), 31 deletions(-)
Changes applied before test
commit e026d33d08a488bc73f11f67ac1b8343cec5cc86
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Jul 30 09:58:40 2020 +0200

    algos.snapshot: Open snapshot_get_from_revision
    
    This leverages the latest change in origin_visit_get and
    origin_visit_status_get to iterate over visit and visit status to detect a
    snapshot targetting a revision.
    
    This algo got used recently in the deposit. It may serve again.
    
    Related to T645

commit de11ef60ea74474c16df45149a97e9cfd8244ff0
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jul 29 16:33:31 2020 +0200

    storage*: add origin_visit_status_get(...) -> PagedResult[OriginVisitStatus]
    
    Related to T645

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/614/ for more details.

hmm, I think snapshot_id_get_from_revision would be a better name for this function, as it doesn't return the whole snapshot

ardumont updated this revision to Diff 12843.EditedThu, Jul 30, 11:17 AM

Adapt according to review:

  • Rename to snapshot_get_id...
  • Add docstring warning
  • Split the main algo using 2 new functions algos.origins.iter_origin_visit(_status)
  • Add tests to those new functions
  • use snapshot-get-all-branches

Build is green

Patch application report for D3648 (id=12843)

Could not rebase; Attempt merge onto 7667f7eef9...

Updating 7667f7ee..eb15e9a2
Fast-forward
 swh/storage/algos/origin.py              |  34 ++++-
 swh/storage/algos/snapshot.py            |  47 ++++++-
 swh/storage/cassandra/cql.py             |  74 +++++++++++
 swh/storage/cassandra/storage.py         |  34 +++++
 swh/storage/db.py                        |  53 ++++----
 swh/storage/in_memory.py                 |  40 ++++++
 swh/storage/interface.py                 |  28 +++++
 swh/storage/storage.py                   |  47 +++++++
 swh/storage/tests/algos/test_origin.py   |  88 ++++++++++++-
 swh/storage/tests/algos/test_snapshot.py |  64 +++++++++-
 swh/storage/tests/test_storage.py        | 205 +++++++++++++++++++++++++++++++
 11 files changed, 680 insertions(+), 34 deletions(-)
Changes applied before test
commit eb15e9a29eca90c393a8631562b82e34ba3fe49d
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Jul 30 09:58:40 2020 +0200

    algos.snapshot: Open snapshot_id_get_from_revision
    
    This leverages the latest change in origin_visit_get and
    origin_visit_status_get to iterate over visit and visit status to detect a
    snapshot targetting a revision.
    
    This algo got used recently in the deposit. It may serve again.
    
    Related to T645

commit de11ef60ea74474c16df45149a97e9cfd8244ff0
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jul 29 16:33:31 2020 +0200

    storage*: add origin_visit_status_get(...) -> PagedResult[OriginVisitStatus]
    
    Related to T645

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/616/ for more details.

ardumont edited the summary of this revision. (Show Details)Thu, Jul 30, 12:53 PM

test_origin.py is missing cases where the functions have to paginate. And also cases where there are no results at all

test_origin.py is missing cases where the functions have to paginate. And also cases where there are no results at all

correct... I got side-tracked and forgot about those.

Build is green

Patch application report for D3648 (id=12847)

Could not rebase; Attempt merge onto 7667f7eef9...

Updating 7667f7ee..1a4fa68c
Fast-forward
 swh/storage/algos/origin.py              |  34 +++++-
 swh/storage/algos/snapshot.py            |  47 +++++++-
 swh/storage/cassandra/cql.py             |  76 +++++++++++++
 swh/storage/cassandra/storage.py         |  26 +++++
 swh/storage/db.py                        |  53 +++++----
 swh/storage/in_memory.py                 |  36 +++++++
 swh/storage/interface.py                 |  24 +++++
 swh/storage/storage.py                   |  42 ++++++++
 swh/storage/tests/algos/test_origin.py   |  88 ++++++++++++++-
 swh/storage/tests/algos/test_snapshot.py |  64 ++++++++++-
 swh/storage/tests/test_storage.py        | 177 +++++++++++++++++++++++++++++++
 11 files changed, 633 insertions(+), 34 deletions(-)
Changes applied before test
commit 1a4fa68c77bcc49d189d7739a576ee8c2f9f8966
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Jul 30 09:58:40 2020 +0200

    algos.snapshot: Open snapshot_id_get_from_revision
    
    This leverages the latest change in origin_visit_get and
    origin_visit_status_get to iterate over visit and visit status to detect a
    snapshot targetting a revision.
    
    This algo got used recently in the deposit. It may serve again.
    
    Related to T645

commit 8de3a714051e326a01aa122e6fb09bf081a316cf
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jul 29 16:33:31 2020 +0200

    storage*: add origin_visit_status_get(...) -> PagedResult[OriginVisitStatus]
    
    Related to T645

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/619/ for more details.

ardumont updated this revision to Diff 12849.Thu, Jul 30, 1:36 PM
  • Rebase on latest master
  • Add corner cases around test_iter_origin_visit*

Build is green

Patch application report for D3648 (id=12849)

Could not rebase; Attempt merge onto e63b78c6f9...

Updating e63b78c6..9e48e563
Fast-forward
 swh/storage/algos/origin.py              |  34 +++++-
 swh/storage/algos/snapshot.py            |  47 +++++++-
 swh/storage/cassandra/cql.py             |  76 +++++++++++++
 swh/storage/cassandra/storage.py         |  26 +++++
 swh/storage/db.py                        |  53 +++++----
 swh/storage/in_memory.py                 |  36 +++++++
 swh/storage/interface.py                 |  24 +++++
 swh/storage/storage.py                   |  42 ++++++++
 swh/storage/tests/algos/test_origin.py   |  97 ++++++++++++++++-
 swh/storage/tests/algos/test_snapshot.py |  64 ++++++++++-
 swh/storage/tests/test_storage.py        | 177 +++++++++++++++++++++++++++++++
 11 files changed, 642 insertions(+), 34 deletions(-)
Changes applied before test
commit 9e48e5630a7f44061e5be2babd3e35d68fa611ff
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Jul 30 09:58:40 2020 +0200

    algos.snapshot: Open snapshot_id_get_from_revision
    
    This leverages the latest change in origin_visit_get and
    origin_visit_status_get to iterate over visit and visit status to detect a
    snapshot targetting a revision.
    
    This algo got used recently in the deposit. It may serve again.
    
    Related to T645

commit a40a982cf61e2f430cd33456db0fccd522a5d14f
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jul 29 16:33:31 2020 +0200

    storage*: add origin_visit_status_get(...) -> PagedResult[OriginVisitStatus]
    
    Related to T645

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/620/ for more details.

vlorentz added inline comments.Thu, Jul 30, 1:59 PM
swh/storage/tests/algos/test_origin.py
339–348

The order is confusing, because "reverse_visits" is the list in chronological order. Do this:

for visit_id in range(20):
    visit = OriginVisit(
        ...
        date=date_past + datetime.timedelta(days=visit_id),
    )
    visits = swh_storage.origin_visit_add(new_visits)
    reversed_visits = list(reversed(visits))

And swap visits and reversed_visits in the following lines

ardumont updated this revision to Diff 12854.Thu, Jul 30, 2:10 PM

Rebase and adapt according to clearer suggestion
@vlorentz quite better thx ^

Build is green

Patch application report for D3648 (id=12854)

Could not rebase; Attempt merge onto 8cf6efa2e6...

Updating 8cf6efa2..7beba93a
Fast-forward
 swh/storage/algos/origin.py              |  34 +++++-
 swh/storage/algos/snapshot.py            |  47 +++++++-
 swh/storage/cassandra/cql.py             |  76 +++++++++++++
 swh/storage/cassandra/storage.py         |  26 +++++
 swh/storage/db.py                        |  53 +++++----
 swh/storage/in_memory.py                 |  36 +++++++
 swh/storage/interface.py                 |  24 +++++
 swh/storage/storage.py                   |  42 ++++++++
 swh/storage/tests/algos/test_origin.py   |  98 ++++++++++++++++-
 swh/storage/tests/algos/test_snapshot.py |  64 ++++++++++-
 swh/storage/tests/test_storage.py        | 177 +++++++++++++++++++++++++++++++
 11 files changed, 643 insertions(+), 34 deletions(-)
Changes applied before test
commit 7beba93a970231e044d48b21491d7c0dbbd43116
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Jul 30 09:58:40 2020 +0200

    algos.snapshot: Open snapshot_id_get_from_revision
    
    This leverages the latest change in origin_visit_get and
    origin_visit_status_get to iterate over visit and visit status to detect a
    snapshot targetting a revision.
    
    This algo got used recently in the deposit. It may serve again.
    
    Related to T645

commit b81f928fa7b941d569bd5d2c9ae8ac7208dbb2c3
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jul 29 16:33:31 2020 +0200

    storage*: add origin_visit_status_get(...) -> PagedResult[OriginVisitStatus]
    
    Related to T645

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/624/ for more details.

vlorentz accepted this revision.Thu, Jul 30, 3:09 PM
This revision is now accepted and ready to land.Thu, Jul 30, 3:09 PM
This revision was automatically updated to reflect the committed changes.