Page MenuHomeSoftware Heritage

Use origin URLs for skipped_content['origin'] instead of origin ids.
ClosedPublic

Authored by vlorentz on Sep 30 2019, 11:17 AM.

Details

Summary

This commit uses URLs *instead of* IDs, not in addition to.
Supporting IDs should not be needed anymore.

Diff Detail

Repository
rDSTO Storage manager
Branch
skipped-content-origin-url
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 7998
Build 11528: tox-on-jenkinsJenkins
Build 11527: arc lint + arc unit

Event Timeline

olasd requested changes to this revision.Sep 30 2019, 11:38 AM
olasd added a subscriber: olasd.

I guess you only really need the hunk in the postgres storage? What is the in-memory storage change trying to achieve ?

swh/storage/in_memory.py
76

the argument should probably be renamed content_and_origins

140

contents_and_origins :P

143

Surely that only works because we only add contents from a single origin at a time; after the filtering, skipped_content_missing and origins aren't the same length any more. You really need to pass the full content to skipped_content_missing, then do the content/origin splitting.

Which, in addition to the double-zipping, makes me wonder if that's really the right way to go at all.

155–158

Could you turn this into a for loop? This isn't very readable.

190

content_and_origins?

This revision now requires changes to proceed.Sep 30 2019, 11:38 AM
In D2040#47240, @olasd wrote:

I guess you only really need the hunk in the postgres storage? What is the in-memory storage change trying to achieve ?

You're right. We don't need to store it in the in-mem storage for now.

remove most of the changes from the in-mem storage; we don't need to store those.

This revision is now accepted and ready to land.Sep 30 2019, 12:01 PM