Page MenuHomeSoftware Heritage

interface: Add origin_snapshot_get_all method
ClosedPublic

Authored by anlambert on Oct 27 2021, 6:45 PM.

Details

Summary

It enables to return in an efficient way the list of unique snapshot
identifiers resulting from the visits of an origin.

Previously it was required to query all visits of an origin then query
all visit statuses for each visit to extract such information.

Introduced method enables to extract origin snaphots information in
a single datase query.

Related to T3631

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D6572 (id=23875)

Rebasing onto 49a932c989...

Current branch diff-target is up to date.
Changes applied before test
commit 384e2e0cb25c4aebdeff2e077dce96aea9fcac6a
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Oct 27 18:26:20 2021 +0200

    interface: Add origin_snapshot_get method
    
    It enables to return in an efficient way the list of unique snapshot
    identifiers resulting from the visits of an origin.
    
    Previously it was required to query all visits of an origin then query
    all visit statuses for each visit to extract such information.
    
    Introduced method enables to extract origin snaphots information in
    a single datase query.
    
    Related to T3631

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1466/ for more details.

vlorentz added a subscriber: vlorentz.

I'd rather name it origin_snapshot_get_all, what do you think?

Small nitpick: you could use set comprehensions ({s.snapshot for s in ...}) instead of a set constructor + generator.

swh/storage/in_memory.py
594

to be consistent with the other ones

This revision is now accepted and ready to land.Oct 28 2021, 12:12 PM

I'd rather name it origin_snapshot_get_all, what do you think?

I also hesitated to use that name, I used origin_snapshot_get to match naming of other methods in storage interface but I agree origin_snapshot_get_all is more explicit about what the method does, will rename then.

Small nitpick: you could use set comprehensions ({s.snapshot for s in ...}) instead of a set constructor + generator.

Ack, will update.

anlambert retitled this revision from interface: Add origin_snapshot_get method to interface: Add origin_snapshot_get_all method.Oct 28 2021, 2:29 PM
anlambert added inline comments.
swh/storage/in_memory.py
594

I kept it that way as this is the cql runner implementation here.

Build is green

Patch application report for D6572 (id=23905)

Rebasing onto 49a932c989...

Current branch diff-target is up to date.
Changes applied before test
commit a5bfe5b5145b2e1da95deb6a975bb9e48ed6adec
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Oct 27 18:26:20 2021 +0200

    interface: Add origin_snapshot_get_all method
    
    It enables to return in an efficient way the list of unique snapshot
    identifiers resulting from the visits of an origin.
    
    Previously it was required to query all visits of an origin then query
    all visit statuses for each visit to extract such information.
    
    Introduced method enables to extract origin snaphots information in
    a single datase query.
    
    Related to T3631

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1470/ for more details.