Page MenuHomeSoftware Heritage

Use origin URLs for skipped_content['origin'] instead of origin ids.
ClosedPublic

Authored by vlorentz on Mon, Sep 30, 11:17 AM.

Details

Summary

This commit uses URLs *instead of* IDs, not in addition to.
Supporting IDs should not be needed anymore.

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

vlorentz created this revision.Mon, Sep 30, 11:17 AM
olasd requested changes to this revision.Mon, Sep 30, 11:38 AM
olasd added a subscriber: olasd.

I guess you only really need the hunk in the postgres storage? What is the in-memory storage change trying to achieve ?

swh/storage/in_memory.py
76

the argument should probably be renamed content_and_origins

140

contents_and_origins :P

143

Surely that only works because we only add contents from a single origin at a time; after the filtering, skipped_content_missing and origins aren't the same length any more. You really need to pass the full content to skipped_content_missing, then do the content/origin splitting.

Which, in addition to the double-zipping, makes me wonder if that's really the right way to go at all.

156–159

Could you turn this into a for loop? This isn't very readable.

190

content_and_origins?

This revision now requires changes to proceed.Mon, Sep 30, 11:38 AM
In D2040#47240, @olasd wrote:

I guess you only really need the hunk in the postgres storage? What is the in-memory storage change trying to achieve ?

You're right. We don't need to store it in the in-mem storage for now.

vlorentz updated this revision to Diff 6857.Mon, Sep 30, 11:44 AM

remove most of the changes from the in-mem storage; we don't need to store those.

olasd accepted this revision.Mon, Sep 30, 12:01 PM
This revision is now accepted and ready to land.Mon, Sep 30, 12:01 PM