Page MenuHomeSoftware Heritage

storages: Refactor journal operations with a dedicated writer collaborator
ClosedPublic

Authored by ardumont on Feb 7 2020, 2:49 AM.

Details

Summary

Prior to this commit, the code was triplicated across the storage backends. Now
all storages use the same collaborator whose concern is writing to the storage.

Test Plan

tox

Diff Detail

Repository
rDSTO Storage manager
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 10485
Build 15624: tox-on-jenkinsJenkins
Build 15623: arc lint + arc unit

Event Timeline

I don't like the name writer, it's less explicit. Maybe keep journal_writer?

Also, you could make the journal writer support model objects, so you don't have to use .to_dict() every time?

swh/storage/in_memory.py
83

is the evolve still needed?

316–319

Filter before the writer...

325

... and then you can remove this

(same for other methods below)

This revision now requires changes to proceed.Feb 7 2020, 11:14 AM
swh/storage/in_memory.py
83

I don't know, i kept the existing behavior.

325

I was aiming at avoiding conflicts from your other incoming diffs...

swh/storage/in_memory.py
83

And i thought to address this in your incoming 'validation' diff as well.

  • storages: Keep journal_writer as collaborator's name
  • Rebase on latest master

Also, you could make the journal writer support model objects, so you don't have to use .to_dict() every time?

hmmm, i wanted to do that after your diffs to decrease the chances of conflicts.

swh/storage/in_memory.py
83

after* your incoming 'validation' diff D2640

Also, yes that attr is needed to trigger the validation check (in the current code context).
In your D2640, it's refactored, so i guess the future rebase i'll do will conflict and i'll adapt it at that moment.

Also, you could make the journal writer support model objects, so you don't have to use .to_dict() every time?

hmmm, i wanted to do that after your diffs to decrease the chances of conflicts.

Reflecting a bit again on this, i stand by my statement.
Out storage backend are not yet consistent everywhere in term of dict/model.model.BaseModel objects use.
So that will be a tad annoying to do so right now.

When i do a diff, i try to fix 1 perimeter at a time, not too many at once...
So the code change impacted can be more understandable (also reverted easily if need be).

So here that means, refactor the journal writer related code and that's it.
And i did not change any other logic (so no conversion step, no filtering logic adaptation, etc...).
It's all the more important that i plan to merge this after the validation diff (D2640) take place (which will alter some logic already).
That reduces the surface of code that could conflict.

Cheers,

  • writer: Keep the signature to only what's needed
  • writer: Make the writer able to deal with BaseModel objects

Challenging my own opinion ;)
(i'll be able to tell if i was right or not, i still think i was but let's challenge it)

  • Rebase on latest master (includes D2640)
  • Squash commits
ardumont retitled this revision from storages: Refactor journal operations with a dedicated writer collab to storages: Refactor journal operations with a dedicated writer collaborator.Feb 10 2020, 5:30 PM
ardumont edited the summary of this revision. (Show Details)
vlorentz added inline comments.
swh/storage/writer.py
16–28

You don't need this, swh-journal already handles swh-model objects.

This revision now requires changes to proceed.Feb 10 2020, 6:14 PM
ardumont edited the summary of this revision. (Show Details)
  • Add missing types on remaining collaborator method signatures
  • Remove unneeded to_dict conversion as the journal is already swh.model.model.BaseModel aware.
swh/storage/writer.py
16–28

Neat.
Thanks.
Adapted.

Remove unnecessary copy instruction (already converting from content object to
a new dict to ease the data key removal)

This revision is now accepted and ready to land.Feb 11 2020, 10:25 AM

Rebase and plug to master branch