Page MenuHomeSoftware Heritage

interface: Add new method origin_visit_get_with_statuses
ClosedPublic

Authored by anlambert on Mar 28 2022, 2:37 PM.

Details

Summary

It enables to retrieve in an efficient and paginated way the list
of visits and all their statuses for a given origin.

Previously, it was required to call origin_visit_status_get
on each visit of the origin to get such list.

Related to T4090

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D7442 (id=26946)

Rebasing onto 835feb6842...

Current branch diff-target is up to date.
Changes applied before test
commit 5d702270471617a491e67b66e7779aa6dfca96d3
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Mar 28 14:33:53 2022 +0200

    interface: Add new method origin_visit_status_latest_get_all
    
    It enables to retrieve in an efficient and paginated way the list
    of latest visit statuses for a given origin.
    
    Previously, it was required to call origin_visit_status_get_latest
    on each visit of the origin to get such list.
    
    Related to T4090

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1582/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1582/console

Harbormaster returned this revision to the author for changes because remote builds failed.Mar 28 2022, 2:47 PM
Harbormaster failed remote builds in B27863: Diff 26946!

Build is green

Patch application report for D7442 (id=26947)

Rebasing onto 835feb6842...

Current branch diff-target is up to date.
Changes applied before test
commit 3c992d250ee47e484fee6e47a6d677b467ba88c5
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Mar 28 14:33:53 2022 +0200

    interface: Add new method origin_visit_status_latest_get_all
    
    It enables to retrieve in an efficient and paginated way the list
    of latest visit statuses for a given origin.
    
    Previously, it was required to call origin_visit_status_get_latest
    on each visit of the origin to get such list.
    
    Related to T4090

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1583/ for more details.

vlorentz added a subscriber: vlorentz.
vlorentz added inline comments.
swh/storage/cassandra/cql.py
1118–1120
swh/storage/cassandra/storage.py
1252–1294
swh/storage/interface.py
1070

incorrect, the limit applies to the number of visit objects, not statuses

This revision now requires changes to proceed.Mar 28 2022, 3:52 PM

Update:

anlambert retitled this revision from interface: Add new method origin_visit_status_latest_get_all to interface: Add new method origin_visit_get_with_statuses.Mar 29 2022, 5:35 PM
anlambert edited the summary of this revision. (Show Details)

Build is green

Patch application report for D7442 (id=27039)

Rebasing onto 835feb6842...

Current branch diff-target is up to date.
Changes applied before test
commit 336c36d0af4a1feeebcbb148e491a0d1cd2a8f04
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Mar 28 14:33:53 2022 +0200

    interface: Add new method origin_visit_get_with_statuses
    
    It enables to retrieve in an efficient and paginated way the list
    of visits and all their statuses for a given origin.
    
    Previously, it was required to call origin_visit_status_get on each
    visit of the origin to get such list.
    
    Related to T4090

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1584/ for more details.

ardumont added inline comments.
swh/storage/api/serializers.py
41

(i'm ok with the general changes but out of curiosity...)

Since you open those serializer for the composite type, what could have prevented to do that same thing with the Tuple[OriginVisit, OriginVisitStatus] (for your first implementation) [1]

[1] in the context of msgpack does not like serializing tuple (or something).

olasd added inline comments.
swh/storage/api/serializers.py
41

msgpack transparently serializes python tuples as msgpack lists. There's no simple way to know if an inbound msgpack list was initially a python tuple or not.

So, one would have to override the RPC client function for origin_visit_get_with_statuses to always convert the returned data, within the result composite object, to a tuple; which is doable, but a bit messy.

See the tuple conversion functions in swh.model to see what the mess can look like.

(generally, explicit types are much better than tuples anyway in terms of expressivity)

swh/storage/api/serializers.py
41

Oh right, i had forgotten that it's not that it does not like tuple, it's that that becomes a list after the serialization...

I do prefer the new version.

Build is green

Patch application report for D7442 (id=27277)

Rebasing onto 812590c25f...

Current branch diff-target is up to date.
Changes applied before test
commit de21e2f59a3c4791d202b8dc97f5eb4f56541ec3
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Mar 28 14:33:53 2022 +0200

    interface: Add new method origin_visit_get_with_statuses
    
    It enables to retrieve in an efficient and paginated way the list
    of visits and all their statuses for a given origin.
    
    Previously, it was required to call origin_visit_status_get on each
    visit of the origin to get such list.
    
    Related to T4090

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1588/ for more details.

vlorentz added inline comments.
swh/storage/postgresql/storage.py
1192–1198
This revision is now accepted and ready to land.Apr 7 2022, 4:05 PM
swh/storage/postgresql/storage.py
1192–1198

Ah right, my first implementation was returning a dict merging visit fields and visit status ones (renaming the date field to status_date) so I could not use that syntax, thanks for spotting !

Build is green

Patch application report for D7442 (id=27301)

Rebasing onto 812590c25f...

Current branch diff-target is up to date.
Changes applied before test
commit 567c8e47ae56737301f4a42dccd2087b41d16d47
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Mar 28 14:33:53 2022 +0200

    interface: Add new method origin_visit_get_with_statuses
    
    It enables to retrieve in an efficient and paginated way the list
    of visits and all their statuses for a given origin.
    
    Previously, it was required to call origin_visit_status_get on each
    visit of the origin to get such list.
    
    Related to T4090

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1589/ for more details.