Page MenuHomeSoftware Heritage

Add a few statsd metrics in the kafka journal client
ClosedPublic

Authored by douardda on Thu, Jan 13, 4:05 PM.

Details

Summary

namely add 2 metrics:

  • swh_journal_client_handle_message_total: (counter) the total number of kafka messages that have been handled by the client,
  • swh_journal_client_status: (gauge) report the current status of the kafka consumer (waiting or processing).

Since these metrics will be created by any kafka consumer (any swh.journal.client), tags should be added to the statsd metrics (eg. via the STATS_TAGS env var) to distinguish between the different types of consumers (e.g. hostname, consumer type/role, etc.)

Especially for the swh_journal_client_status to work properly
in an elastic execution environment (docker swarm, k8s, ...) it is
mandatory to add a consumer-specific tag (e.g. the hostname) and make
sure the TTL for the metrics in prometherus-statsd-exporter is pretty
short (e.g. 1h), so that restarted consumers are not kept for too long/forever by statsd-exporter.

Depends on D6884

Diff Detail

Repository
rDJNL Journal infrastructure
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D6944 (id=25143)

Could not rebase; Attempt merge onto 0d115993e0...

Updating 0d11599..0d509dc
Fast-forward
 swh/journal/client.py | 85 +++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 66 insertions(+), 19 deletions(-)
Changes applied before test
commit 0d509dc909c0d129e7641e7f0f302dd613a57c5f
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jan 13 15:53:12 2022 +0100

    Add a few statsd metrics in the kafka journal client
    
    namely add 2 metrics:
    - swh_journal_client_handle_message_total: (counter) the total number  of kafka messages that have been handled by the client,
    - swh_journal_client_status: (gauge) report the current status of the
      kafka consumer (waiting or processing).
    
    Since these metrics will be created by any kafka consumer (any swh.journal.client), tags should be added to the statsd metrics (eg. via the STATS_TAGS env var) to distinguish between the different types of consumers (e.g. hostname, consumer type/role, etc.)
    
    Especially for the swh_journal_client_status to work properly
    in an elastic execution environment (docker swarm, k8s, ...) it is
    mandatory to add a consumer-specific tag (e.g. the hostname) and make
    sure the TTL for the metrics in prometherus-statsd-exporter is pretty
    short (e.g. 1h), so that restarted consumers are not kept for too long/forever by statsd-exporter.

commit 5b17c50d32809c1cb4f3505846d23572c5650496
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jan 6 12:07:50 2022 +0100

    Add support for the rdkafka 'stats_cb' config option in get_journal_client
    
    this options allows to define a callback which will be called once every
    'statistics.interval.ms' ms by rdkafka with a bunch of statistics (as a
    json string).
    
    See https://github.com/edenhill/librdkafka/blob/master/STATISTICS.md
    
    This allows to define this callback as a string of the form:
    
      "path.to.module:function"

Link to build: https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/192/
See console output for more information: https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/192/console

Harbormaster returned this revision to the author for changes because remote builds failed.Thu, Jan 13, 4:06 PM
Harbormaster failed remote builds in B26012: Diff 25143!

fix requirements-swh for mypy

Build has FAILED

Patch application report for D6944 (id=25161)

Could not rebase; Attempt merge onto 0d115993e0...

Updating 0d11599..a0391a7
Fast-forward
 requirements-swh.txt  |  1 +
 swh/journal/client.py | 85 +++++++++++++++++++++++++++++++++++++++------------
 2 files changed, 67 insertions(+), 19 deletions(-)
Changes applied before test
commit a0391a7cccaa7ab41367eca0e73af694b29fddd4
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jan 13 15:53:12 2022 +0100

    Add a few statsd metrics in the kafka journal client
    
    namely add 2 metrics:
    - swh_journal_client_handle_message_total: (counter) the total number  of kafka messages that have been handled by the client,
    - swh_journal_client_status: (gauge) report the current status of the
      kafka consumer (waiting or processing).
    
    Since these metrics will be created by any kafka consumer (any swh.journal.client), tags should be added to the statsd metrics (eg. via the STATS_TAGS env var) to distinguish between the different types of consumers (e.g. hostname, consumer type/role, etc.)
    
    Especially for the swh_journal_client_status to work properly
    in an elastic execution environment (docker swarm, k8s, ...) it is
    mandatory to add a consumer-specific tag (e.g. the hostname) and make
    sure the TTL for the metrics in prometherus-statsd-exporter is pretty
    short (e.g. 1h), so that restarted consumers are not kept for too long/forever by statsd-exporter.

commit 5b17c50d32809c1cb4f3505846d23572c5650496
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jan 6 12:07:50 2022 +0100

    Add support for the rdkafka 'stats_cb' config option in get_journal_client
    
    this options allows to define a callback which will be called once every
    'statistics.interval.ms' ms by rdkafka with a bunch of statistics (as a
    json string).
    
    See https://github.com/edenhill/librdkafka/blob/master/STATISTICS.md
    
    This allows to define this callback as a string of the form:
    
      "path.to.module:function"

Link to build: https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/193/
See console output for more information: https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/193/console

Harbormaster returned this revision to the author for changes because remote builds failed.Fri, Jan 14, 10:36 AM
Harbormaster failed remote builds in B26027: Diff 25161!

Build is green

Patch application report for D6944 (id=25163)

Could not rebase; Attempt merge onto 0d115993e0...

Updating 0d11599..8b36f0e
Fast-forward
 requirements-swh.txt  |  1 +
 requirements-test.txt |  1 +
 swh/journal/client.py | 85 +++++++++++++++++++++++++++++++++++++++------------
 3 files changed, 68 insertions(+), 19 deletions(-)
Changes applied before test
commit 8b36f0e40ed1d9b56e33e2c0c6e28535e552cb79
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jan 13 15:53:12 2022 +0100

    Add a few statsd metrics in the kafka journal client
    
    namely add 2 metrics:
    - swh_journal_client_handle_message_total: (counter) the total number  of kafka messages that have been handled by the client,
    - swh_journal_client_status: (gauge) report the current status of the
      kafka consumer (waiting or processing).
    
    Since these metrics will be created by any kafka consumer (any swh.journal.client), tags should be added to the statsd metrics (eg. via the STATS_TAGS env var) to distinguish between the different types of consumers (e.g. hostname, consumer type/role, etc.)
    
    Especially for the swh_journal_client_status to work properly
    in an elastic execution environment (docker swarm, k8s, ...) it is
    mandatory to add a consumer-specific tag (e.g. the hostname) and make
    sure the TTL for the metrics in prometherus-statsd-exporter is pretty
    short (e.g. 1h), so that restarted consumers are not kept for too long/forever by statsd-exporter.

commit 5b17c50d32809c1cb4f3505846d23572c5650496
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jan 6 12:07:50 2022 +0100

    Add support for the rdkafka 'stats_cb' config option in get_journal_client
    
    this options allows to define a callback which will be called once every
    'statistics.interval.ms' ms by rdkafka with a bunch of statistics (as a
    json string).
    
    See https://github.com/edenhill/librdkafka/blob/master/STATISTICS.md
    
    This allows to define this callback as a string of the form:
    
      "path.to.module:function"

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/194/ for more details.

The code for the gauges feels like something that would be usefully handled with a context manager.

Something like (untested)

class StatsdStatusGauges:
    def __init__(self, metric_name: str, statuses: Collection[str], common_tags: Optional[Dict[str, str]] = None):
        self.metric_name = metric_name
        self.statuses = set(statuses)
        self.common_metrics = common_tags or {}
        self.current_status: Optional[str] = None

    def reset_gauges(self):
        self.current_status = None
        for status in self.statuses:
            statsd.gauge(self.metric_name, 0, {**self.common_tags, "status": status})

    def send_current_gauge(self, value: int):
        if self.current_status is not None:
            statsd.gauge(self.metric_name, value, {**self.common_tags, "status": self.current_status})

    def set(self, new_status: str):
        if new_status not in self.statuses:
            raise ValueError(f'{new_status} not in {self.statuses}')

        # May not be needed; May even be counter-productive if we want to send the gauges to keep them around in the statsd exporter
        if new_status == self.current_status:
            return

        self.send_current_gauge(0)
        self.current_status = new_status
        self.send_current_gauge(1)

    def __enter__(self):
        self.reset_gauges()
        return self

    def __exit__(self, *exc):
        self.reset_gauges()
        return False

Which would be used like:

with StatsdStatusGauges(JOURNAL_STATUS_METRIC, {"processing", "waiting"}) as status_gauge:
    [...]
    status_gauge.set("waiting")
    [...]
    status_gauge.set("processing")
swh/journal/client.py
354

looks like a spurious change

In D6944#181139, @olasd wrote:

The code for the gauges feels like something that would be usefully handled with a context manager.

Something like (untested)

class StatsdStatusGauges:
    def __init__(self, metric_name: str, statuses: Collection[str], common_tags: Optional[Dict[str, str]] = None):
        self.metric_name = metric_name
        self.statuses = set(statuses)
        self.common_metrics = common_tags or {}
        self.current_status: Optional[str] = None

    def reset_gauges(self):
        self.current_status = None
        for status in self.statuses:
            statsd.gauge(self.metric_name, 0, {**self.common_tags, "status": status})

    def send_current_gauge(self, value: int):
        if self.current_status is not None:
            statsd.gauge(self.metric_name, value, {**self.common_tags, "status": self.current_status})

    def set(self, new_status: str):
        if new_status not in self.statuses:
            raise ValueError(f'{new_status} not in {self.statuses}')

        # May not be needed; May even be counter-productive if we want to send the gauges to keep them around in the statsd exporter
        if new_status == self.current_status:
            return

        self.send_current_gauge(0)
        self.current_status = new_status
        self.send_current_gauge(1)

    def __enter__(self):
        self.reset_gauges()
        return self

    def __exit__(self, *exc):
        self.reset_gauges()
        return False

Which would be used like:

with StatsdStatusGauges(JOURNAL_STATUS_METRIC, {"processing", "waiting"}) as status_gauge:
    [...]
    status_gauge.set("waiting")
    [...]
    status_gauge.set("processing")

wouldn't we prefer this class to be in swh-core then?

update using new statsd.status_gauge() context manager (in swh.core 1.1)

Build is green

Patch application report for D6944 (id=25350)

Could not rebase; Attempt merge onto 0d115993e0...

Updating 0d11599..5a26dae
Fast-forward
 requirements-swh.txt  |  1 +
 requirements-test.txt |  1 +
 swh/journal/client.py | 80 +++++++++++++++++++++++++++++++++++++++------------
 3 files changed, 63 insertions(+), 19 deletions(-)
Changes applied before test
commit 5a26dae22928eac7992cce702bed3e0acd9af885
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jan 13 15:53:12 2022 +0100

    Add a few statsd metrics in the kafka journal client
    
    namely add 2 metrics:
    - swh_journal_client_handle_message_total: (counter) the total number  of kafka messages that have been handled by the client,
    - swh_journal_client_status: (gauge) report the current status of the
      kafka consumer (waiting or processing).
    
    Since these metrics will be created by any kafka consumer (any swh.journal.client), tags should be added to the statsd metrics (eg. via the STATS_TAGS env var) to distinguish between the different types of consumers (e.g. hostname, consumer type/role, etc.)
    
    Especially for the swh_journal_client_status to work properly
    in an elastic execution environment (docker swarm, k8s, ...) it is
    mandatory to add a consumer-specific tag (e.g. the hostname) and make
    sure the TTL for the metrics in prometherus-statsd-exporter is pretty
    short (e.g. 1h), so that restarted consumers are not kept for too long/forever by statsd-exporter.

commit 5b17c50d32809c1cb4f3505846d23572c5650496
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jan 6 12:07:50 2022 +0100

    Add support for the rdkafka 'stats_cb' config option in get_journal_client
    
    this options allows to define a callback which will be called once every
    'statistics.interval.ms' ms by rdkafka with a bunch of statistics (as a
    json string).
    
    See https://github.com/edenhill/librdkafka/blob/master/STATISTICS.md
    
    This allows to define this callback as a string of the form:
    
      "path.to.module:function"

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/195/ for more details.

This revision is now accepted and ready to land.Fri, Jan 21, 11:28 AM