Page MenuHomeSoftware Heritage

Split 'content_add' method into 'content_add' and 'skipped_content_add'.
ClosedPublic

Authored by vlorentz on Feb 5 2020, 12:18 PM.

Details

Summary

Respectively to add present content and skipped content.

This simplifies the logic of both methods, and is a necessary step to
typing / using swh-model objects everywhere, as contents have quite
different attributes depending on whether they are present or missing.

For the moment, this diff only implements for the in-mem backend, not
pg nor cql

Diff Detail

Repository
rDSTO Storage manager
Branch
split-content-add
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 10450
Build 15566: tox-on-jenkinsJenkins
Build 15565: arc lint + arc unit

Event Timeline

BUILD has failed

expected as this needs a new swh.model release

This revision is now accepted and ready to land.Feb 5 2020, 12:27 PM

implement for all backends and proxies

vlorentz retitled this revision from [WIP] Split 'content_add' method into 'content_add' and 'skipped_content_add'. to Split 'content_add' method into 'content_add' and 'skipped_content_add'..
vlorentz added a reviewer: Reviewers.

add 'origin' column to cassandra

Not finished the review but i'm side-tracked (indexer storage failing for one) by other stuff so submitting remarks for now, only small ones.

swh/storage/in_memory.py
293

why not directly iterate over skipped_content_missing (removing the explicit call to list in previous line)?

swh/storage/interface.py
304

by the backend.

306

some contents

oh yeah, i accepted the review in a first drafted version...

swh/storage/in_memory.py
293

probably because I wanted to print it while debugging

ardumont added inline comments.
swh/storage/in_memory.py
293

heh, happens too me as well.

swh/storage/storage.py
445

can't we avoid one loop?

now = datetime.datetime.now(tz=datetime.timezone.utc)
content = [dict(c.items(), ctime=now) for c in content]
This revision is now accepted and ready to land.Feb 6 2020, 12:33 PM
swh/storage/storage.py
445

I find my code simpler. Python does not have to be purely functional :)

Fix postgresql error + unskip test for cassandra

Implement skipped_content_missing for cassandra

ardumont added inline comments.
swh/storage/storage.py
445

It's more to avoid one extra iteration over the list...
but it's skipped content so it's usually very little.

swh/storage/storage.py
445

a list comprehension is a loop too...