Page MenuHomeSoftware Heritage

Remove 'journal_type' argument from the CLI
ClosedPublic

Authored by vlorentz on Apr 22 2021, 4:30 PM.

Details

Summary

Motivation:

  • This simplifies the CLI, and stops leaking implementation details (what field the implementation reads, what keys contains, that keys exist at all, etc.) through this interface.
  • It also removes the need to have two clients, both reading revisions and releases.

This preserves the optimization of not deserializing unneeded values,
by passing raw message values to the worker functions; which can
deserialize it themselves if needed.

Additionally, this commit fixes some mistakes in type annotations.

Test Plan

Two tests hang. I don't understand why

Diff Detail

Repository
rDCNT Archive counters
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Apr 22 2021, 4:36 PM
Harbormaster failed remote builds in B20934: Diff 19929!

found the bug, it was variable shadowing in handle_messages.

Build was aborted

Patch application report for D5576 (id=19928)

Rebasing onto 5cae9b7cfe...

Current branch diff-target is up to date.
Changes applied before test
commit a4115a6091373d564a5ec4ef7333bb5126a4ed91
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Apr 22 16:28:48 2021 +0200

    Remove 'journal_type' argument from the CLI
    
    Motivation:
    
    * This simplifies the CLI, and stops leaking implementation details
      (what field the implementation reads, what keys contains, that
      keys exist at all, etc.) through this interface.
    
    * It also removes the need to have two clients, both reading revisions
      and releases.
    
    This preserves the optimization of not deserializing unneeded values,
    by passing raw message values to the worker functions; which can
    deserialize it themselves if needed.
    
    Additionally, this commit fixes some mistakes in type annotations.

Link to build: https://jenkins.softwareheritage.org/job/DCNT/job/tests-on-diff/45/
See console output for more information: https://jenkins.softwareheritage.org/job/DCNT/job/tests-on-diff/45/console

Build was aborted

Patch application report for D5576 (id=19929)

Rebasing onto 5cae9b7cfe...

Current branch diff-target is up to date.
Changes applied before test
commit 9aba347bbdf7914fd0c3924770ddd6fa01f91d74
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Apr 22 16:28:48 2021 +0200

    Remove 'journal_type' argument from the CLI
    
    Motivation:
    
    * This simplifies the CLI, and stops leaking implementation details
      (what field the implementation reads, what keys contains, that
      keys exist at all, etc.) through this interface.
    
    * It also removes the need to have two clients, both reading revisions
      and releases.
    
    This preserves the optimization of not deserializing unneeded values,
    by passing raw message values to the worker functions; which can
    deserialize it themselves if needed.
    
    Additionally, this commit fixes some mistakes in type annotations.

Link to build: https://jenkins.softwareheritage.org/job/DCNT/job/tests-on-diff/46/
See console output for more information: https://jenkins.softwareheritage.org/job/DCNT/job/tests-on-diff/46/console

Build is green

Patch application report for D5576 (id=19930)

Rebasing onto 5cae9b7cfe...

Current branch diff-target is up to date.
Changes applied before test
commit e1c8e4d8533e978e067bbc891b345f742c01e874
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Apr 22 16:28:48 2021 +0200

    Remove 'journal_type' argument from the CLI
    
    Motivation:
    
    * This simplifies the CLI, and stops leaking implementation details
      (what field the implementation reads, what keys contains, that
      keys exist at all, etc.) through this interface.
    
    * It also removes the need to have two clients, both reading revisions
      and releases.
    
    This preserves the optimization of not deserializing unneeded values,
    by passing raw message values to the worker functions; which can
    deserialize it themselves if needed.
    
    Additionally, this commit fixes some mistakes in type annotations.

See https://jenkins.softwareheritage.org/job/DCNT/job/tests-on-diff/47/ for more details.

I suppose this implementation will be less effective in term of memory consumption as we will keep a copy of the message contents on the dict (I suppose the message.value() is returning a copy of the content)
It also makes the module aware of how the objects are serialized on kafka, which looks quite low level.

It's more soft considerations than real concerns no objections to have only one entry point (there is also one other pros as it will avoid to update the puppet configuration to activate the person's couter)

This revision is now accepted and ready to land.Apr 22 2021, 6:46 PM

It also makes the module aware of how the objects are serialized on kafka, which looks quite low level.

Yeah :/

But swh.counters is already breaking the journal abstraction by accessing keys, so I figured I might as well do this too

This revision was landed with ongoing or failed builds.Apr 22 2021, 8:21 PM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D5576 (id=19940)

Rebasing onto 85cac362ca...

First, rewinding head to replay your work on top of it...
Fast-forwarded diff-target to base-revision-48-D5576.
Changes applied before test

See https://jenkins.softwareheritage.org/job/DCNT/job/tests-on-diff/48/ for more details.

I suppose the message.value() is returning a copy of the content

It does not: https://github.com/confluentinc/confluent-kafka-python/blob/b7f8dce998ec254f54c15e514850d7404c6a71a3/src/confluent_kafka/src/confluent_kafka.c#L439-L445

(Py_INCREF only increments the reference counter)

thanks for the pointer