Page MenuHomeSoftware Heritage

Make snapshot_get_branches return a TypedDict containing SnapshotBranch objects.
ClosedPublic

Authored by vlorentz on Aug 7 2020, 2:04 PM.

Details

Summary

Instead of untyped dictionaries.

This makes snapshot_get longer and duplicates its code across backends;
but snapshot_get should be removed soon.

Related: T645

Diff Detail

Repository
rDSTO Storage manager
Branch
snapshot-get-branches
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 14445
Build 22212: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 22211: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D3740 (id=13167)

Rebasing onto d9ff3912d5...

Current branch diff-target is up to date.
Changes applied before test
commit 9235cb4e7ba2ffe6d63c0d9278807c9f640d24ee
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 7 14:04:37 2020 +0200

    Make snapshot_get_branches return a TypedDict containing SnapshotBranch objects.
    
    Instead of untyped dictionaries.
    
    This makes snapshot_get longer and duplicates its code across backends;
    but snapshot_get should be removed soon.
    
    Related: T645

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/716/ for more details.

anlambert added inline comments.
swh/storage/interface.py
43

I am not a big fan of that naming. For small snapshots, the returned branches set will not be partial for instance. How about BranchesResultSet instead ?

swh/storage/interface.py
43

or BranchesPagedResult ?

swh/storage/interface.py
43

I couldn't find a better name. ResultSet name-clashes with cassandra (plus, it doesn't make it obvious it's only a partial view), and PagedResult with the class with that name.

swh/storage/interface.py
43

How about simply Branches then ?

def snapshot_get_branches(..) -> Optional[Branches]

Not a shocker to me.

swh/storage/interface.py
43

it still doesn't page it clear it's a partial view.

swh/storage/interface.py
43

I proposed in T645 that the method be typed [1]

def snapshot_get_branches(..) -> Optional[PagedResult[SnapshotBranch]]

Optional in case the snapshot is (Sha1Git) does not exist... (or raise to avoid the specific type...)
Paged.

The PagedResult implies the result can be partial...
To have a full snapshot, the client needs to call the algo.snapshot_get_full endpoint (<- use the real name which escapes me right now me ;)

wdyt?

[1] T645#47156

swh/storage/interface.py
43

PagedResult isn't great, because snapshot_get_branches doesn't use an opaque token for pagination.

swh/storage/interface.py
43

Optional in case the snapshot is (Sha1Git) does not exist... (or raise to avoid the specific type...)
Paged.

sorry for the typos, one must read...

Optional in case the snapshot id (Sha1Git) does not exist... (or raise to avoid the specific case of the inexisting snapshot)
43

PagedResult isn't great, because snapshot_get_branches doesn't use an opaque token for pagination.

What's preventing us from doing it that?
The snapshot branches order is not guaranteed, isn't it?

swh/storage/interface.py
43

doing that way*... (need some sleep, sorry again...)

swh/storage/interface.py
43

What's preventing us from doing it that?

Nothing, but there is no reason to switch to an opaque token.

The snapshot branches order is not guaranteed, isn't it?

It is; snapshot branches are sorted by name.

swh/storage/interface.py
43

The fact that the returned branches set might be partial is clearly indicated in the docstring.

If you want to keep the Partial prefix, at least remove the Dict suffix as it is not really needed.

This revision is now accepted and ready to land.Aug 7 2020, 3:54 PM

Build is green

Patch application report for D3740 (id=13179)

Rebasing onto d9ff3912d5...

Current branch diff-target is up to date.
Changes applied before test
commit 4918759fc84376b16ebf82c60da9d2c17d265821
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 7 14:04:37 2020 +0200

    Make snapshot_get_branches return a TypedDict containing SnapshotBranch objects.
    
    Instead of untyped dictionaries.
    
    This makes snapshot_get longer and duplicates its code across backends;
    but snapshot_get should be removed soon.
    
    Related: T645

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/717/ for more details.