Page MenuHomeSoftware Heritage

Split 'content_add' method into 'content_add' and 'skipped_content_add'.
ClosedPublic

Authored by vlorentz on Feb 5 2020, 12:18 PM.

Details

Summary

Respectively to add present content and skipped content.

This simplifies the logic of both methods, and is a necessary step to
typing / using swh-model objects everywhere, as contents have quite
different attributes depending on whether they are present or missing.

For the moment, this diff only implements for the in-mem backend, not
pg nor cql

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

vlorentz created this revision.Feb 5 2020, 12:18 PM

BUILD has failed

expected as this needs a new swh.model release

ardumont accepted this revision.Feb 5 2020, 12:27 PM
This revision is now accepted and ready to land.Feb 5 2020, 12:27 PM
vlorentz updated this revision to Diff 9390.Feb 5 2020, 3:43 PM

implement for all backends and proxies

vlorentz requested review of this revision.Feb 5 2020, 3:43 PM
vlorentz retitled this revision from [WIP] Split 'content_add' method into 'content_add' and 'skipped_content_add'. to Split 'content_add' method into 'content_add' and 'skipped_content_add'..
vlorentz added a reviewer: Reviewers.
vlorentz updated this revision to Diff 9391.Feb 5 2020, 3:52 PM

add 'origin' column to cassandra

vlorentz updated this revision to Diff 9392.Feb 5 2020, 3:54 PM

remove debug prints

Harbormaster failed remote builds in B10452: Diff 9392!
vlorentz updated this revision to Diff 9395.Feb 5 2020, 5:53 PM

fix failing test

ardumont added a comment.EditedFeb 5 2020, 5:55 PM

Not finished the review but i'm side-tracked (indexer storage failing for one) by other stuff so submitting remarks for now, only small ones.

swh/storage/in_memory.py
294

why not directly iterate over skipped_content_missing (removing the explicit call to list in previous line)?

swh/storage/interface.py
305

by the backend.

307

some contents

oh yeah, i accepted the review in a first drafted version...

vlorentz updated this revision to Diff 9401.Feb 6 2020, 11:19 AM

apply comments

vlorentz added inline comments.Feb 6 2020, 11:25 AM
swh/storage/in_memory.py
294

probably because I wanted to print it while debugging

ardumont accepted this revision.Feb 6 2020, 12:33 PM
ardumont added inline comments.
swh/storage/in_memory.py
294

heh, happens too me as well.

swh/storage/storage.py
442

can't we avoid one loop?

now = datetime.datetime.now(tz=datetime.timezone.utc)
content = [dict(c.items(), ctime=now) for c in content]
This revision is now accepted and ready to land.Feb 6 2020, 12:33 PM
vlorentz added inline comments.Feb 6 2020, 1:28 PM
swh/storage/storage.py
442

I find my code simpler. Python does not have to be purely functional :)

vlorentz updated this revision to Diff 9407.Feb 6 2020, 2:15 PM

Fix postgresql error + unskip test for cassandra

vlorentz updated this revision to Diff 9409.Feb 6 2020, 2:29 PM

Implement skipped_content_missing for cassandra

ardumont accepted this revision.Feb 6 2020, 2:33 PM
ardumont added inline comments.
swh/storage/storage.py
442

It's more to avoid one extra iteration over the list...
but it's skipped content so it's usually very little.

vlorentz added inline comments.Feb 6 2020, 2:34 PM
swh/storage/storage.py
442

a list comprehension is a loop too...