Page MenuHomeSoftware Heritage

Add a StreamJournalWriter backend
ClosedPublic

Authored by douardda on Jun 17 2021, 7:39 PM.

Details

Summary

may be used to generate a on-disk representation of a Storage, for
example to produce test datasets, etc.

Diff Detail

Repository
rDJNL Journal infrastructure
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D5890 (id=21098)

Rebasing onto 8358298c4e...

Current branch diff-target is up to date.
Changes applied before test
commit f8bc12e243bdff47f4b37f8bbdd129edc262b8a5
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jun 17 18:52:17 2021 +0200

    Add a StreamJournalWrtier backend
    
    may be used to generate a on-disk representation of a Storage, for
    example to produce test datasets, etc.

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/164/ for more details.

Build is green

Patch application report for D5890 (id=21099)

Rebasing onto 8358298c4e...

Current branch diff-target is up to date.
Changes applied before test
commit a8c9a3bbe3936cefa5a947b8fc810319303c5c2f
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jun 17 18:52:17 2021 +0200

    Add a StreamJournalWriter backend
    
    may be used to generate a on-disk representation of a Storage, for
    example to produce test datasets, etc.

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/165/ for more details.

douardda retitled this revision from Add a StreamJournalWrtier backend to Add a StreamJournalWriter backend.Jun 17 2021, 8:39 PM

update the copyright in __init__.py

Build is green

Patch application report for D5890 (id=21103)

Rebasing onto 8358298c4e...

Current branch diff-target is up to date.
Changes applied before test
commit 128ea9da64afcb0961eb5d44161c1e3f36ce2878
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jun 17 18:52:17 2021 +0200

    Add a StreamJournalWriter backend
    
    may be used to generate a on-disk representation of a Storage, for
    example to produce test datasets, etc.

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/166/ for more details.

lgtm couple of remarks, suggestions inline.

swh/journal/writer/__init__.py
58 ↗(On Diff #21103)

You are missing an update in the get_journal_writer factory tests for this.

swh/journal/writer/stream.py
20 ↗(On Diff #21103)

Maybe mention what you said in the diff description, that the use case for this would be for example to generate test data.

40 ↗(On Diff #21103)

I guess you are forced to override this because of the slight change in the signature with the TValue type, right?

(Asking because it looks like the original implementation)

swh/journal/writer/__init__.py
58 ↗(On Diff #21103)

indeed

swh/journal/writer/stream.py
20 ↗(On Diff #21103)

thx

40 ↗(On Diff #21103)

I guess I've just copy/pasted the in_memory implementation and did not look at the relevance of keeping this after adaptation :-)

swh/journal/writer/__init__.py
58 ↗(On Diff #21103)

You are missing an update in the get_journal_writer factory tests for this.

actually, what test are you talking about here?

swh/journal/writer/stream.py
40 ↗(On Diff #21103)

I guess you are forced to override this because of the slight change in the signature with the TValue type, right?

(Asking because it looks like the original implementation)

Actually I don't understand what "original implementation" we are talking about here.

fix docstring, thx ardumont

swh/journal/writer/__init__.py
58 ↗(On Diff #21103)

It's too bad that there aren't any in there. We got some in other modules [1], i guess we'll add them later then

[1] https://forge.softwareheritage.org/source/swh-storage/browse/master/swh/storage/tests/test_init.py

Build is green

Patch application report for D5890 (id=21104)

Rebasing onto 8358298c4e...

Current branch diff-target is up to date.
Changes applied before test
commit ee64b911b06da244750ad37708b4bd9a0d8f707f
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jun 17 18:52:17 2021 +0200

    Add a StreamJournalWriter backend
    
    may be used to generate a on-disk representation of a Storage, for
    example to produce test datasets, etc.

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/167/ for more details.

swh/journal/writer/stream.py
40 ↗(On Diff #21103)

It looks a lot like the main implementation [1] but there is nothing about protocol or implementation so i guess it's normal to somehow repeat this kind of code.

[1] https://forge.softwareheritage.org/source/swh-journal/browse/master/swh/journal/writer/kafka.py$266-273

use get_journal_writer in test_stream

so we have a slightly better test coverage for free ;-)

vlorentz added inline comments.
swh/journal/writer/stream.py
19–24 ↗(On Diff #21104)

I see you embraced Generic :)

Can you document the meaning of TValue?

26 ↗(On Diff #21104)

small simplification in the docstring

Build is green

Patch application report for D5890 (id=21105)

Rebasing onto 8358298c4e...

Current branch diff-target is up to date.
Changes applied before test
commit df98ebc89f11385c0cb5fcc4119e74e248776dc6
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jun 17 18:52:17 2021 +0200

    Add a StreamJournalWriter backend
    
    may be used to generate a on-disk representation of a Storage, for
    example to produce test datasets, etc.

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/168/ for more details.

swh/journal/writer/stream.py
26 ↗(On Diff #21104)

I was sure I wrote the annotation of the output_stream... apparently I did not... oh well, thanks

Build is green

Patch application report for D5890 (id=21106)

Rebasing onto 8358298c4e...

Current branch diff-target is up to date.
Changes applied before test
commit 6d40ad9eec003b5de681db3b0c626e350aa8f3d3
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jun 17 18:52:17 2021 +0200

    Add a StreamJournalWriter backend
    
    may be used to generate a on-disk representation of a Storage, for
    example to produce test datasets, etc.

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/169/ for more details.

Add a revision to fix the annotation of InMemory's value_sanitizer

and fix it also for the StreamJWriter

swh/journal/writer/stream.py
19–24 ↗(On Diff #21104)

Can you document the meaning of TValue?

If you don't mind, I'd rather see this the purpose of a dedicated refactoring diff (possibly introducing a JournalWriterProtocol for these backends).

Build is green

Patch application report for D5890 (id=21110)

Rebasing onto 8358298c4e...

Current branch diff-target is up to date.
Changes applied before test
commit a06bab98b11582419e7d85357feb6ffe665083b0
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jun 17 18:52:17 2021 +0200

    Add a StreamJournalWriter backend
    
    may be used to generate a on-disk representation of a Storage, for
    example to produce test datasets, etc.

commit a4ae96d12d2c938c7543fedb045ee44f532435d0
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jun 18 10:45:50 2021 +0200

    Better annotation for InMemoryJournalWriter's value_sanitizer
    
    make it consistent with the KafkaJournalWriter.

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/170/ for more details.

This revision is now accepted and ready to land.Jun 18 2021, 10:53 AM