Page MenuHomeSoftware Heritage

swh-storage: Install counter metrics
ClosedPublic

Authored by ardumont on Mar 29 2019, 6:05 PM.

Details

Reviewers
olasd
douardda
vlorentz
Group Reviewers
Reviewers
Commits
rDSTOC1da7b500140f: api.server: Rename send_metric as we really send only 1 metric
rDSTOCe9a3198a3c20: api.server: Add tests around send_metric function
rDSTOC1a401c0db7d9: api.server: Make send_metric be in charge of the metric parsing
rDSTOC7d6d894ad6df: api.server: Use constants for all metrics
rDSTOC9b35b2d63e52: server: Add metrics on origin_metadata_add & metadata_provider_add
rDSTOC2f1e763db7e2: api.server: Refactor repetition into a send_metrics function
rDSTOC04792f87f6ca: server: Add metrics on tool_add endpoint
rDSTOC6064a316fd13: api.{client,server}: Remove dead code about entity
rDSTOC6dee5f5de633: api.server: Fix wrong api call
rDSTOCff055909bcf7: api.server: Rename constant
rDSTOCee934b6dfaca: storage: Make tool_add endpoint return a list instead of a generator
rDSTOCe357a3862973: api.server: Include the encoding step when timing
rDSTOC4518363e7b7f: api.server.process_metrics: Skip processing when value is 0
rDSTOC907c4adb9104: api.server: Add metrics on origin_add and origin_add_one
rDSTOCcb7b9eea5f7b: api.server: Rename to process_metrics function name
rDSTOC529c4bef7a94: storage: Fix typo in variable name
rDSTOCbb19ce34c31e: server.api: Fix metric names respecting prometheus conventions [1]
rDSTOCb7d9896fa3fe: api.server: Add metrics to origin_visit_add
rDSTOCce999b589770: api.server: Exploit new endpoint to compute metrics
rDSTOC01b8a9ae22f7: storage.in_memory: Fix missing filtering on release_add
rDSTOCddd1835c584d: api.server: Send computed metrics
rDSTOC007f3de44b0b: storage*.snapshot_add: Return summary
rDSTOCab7f0088d4c0: storage*.release_add: Return summary
rDSTOCf18af309f9f4: swh.storage: Adapt and unify _add endpoints
rDSTOC25da407238a0: directory_add: Normalize metrics keys
rDSTOCb3414465ed62: api.server.directory_add: Add counter metrics
rDSTOCf36644637be6: storage*.revision_add: Return summary
rDSTOCa284ec0d5515: storage: Update tests to check the summary result act as expected
rDSTOCecbb6111509a: storage*.directory_add: Return summary
rDSTOCcd7836491f31: content_add: Normalize the summary keys
rDSTOCfb4be03d66a9: storage: Align implementations to compute correct metrics
rDSTOC71bd5bf250c5: storage.in_memory.content_add: Adapt implementation to send summary
rDSTOC3e45179a1937: Install counter metrics
rDSTOCee62f46fca46: api.server: Remove encode step from metrics
rDSTOC253985814e8a: api.server: Remove timed metric on the get_storage function
rDSTOC7f798bb8ecb9: api.server: Update docstrings on content_add to explicit the output
R65:e9a3198a3c20: api.server: Add tests around send_metric function
R65:1a401c0db7d9: api.server: Make send_metric be in charge of the metric parsing
R65:ff055909bcf7: api.server: Rename constant
R65:1da7b500140f: api.server: Rename send_metric as we really send only 1 metric
R65:7d6d894ad6df: api.server: Use constants for all metrics
R65:6dee5f5de633: api.server: Fix wrong api call
R65:9b35b2d63e52: server: Add metrics on origin_metadata_add & metadata_provider_add
R65:04792f87f6ca: server: Add metrics on tool_add endpoint
R65:6064a316fd13: api.{client,server}: Remove dead code about entity
R65:ee934b6dfaca: storage: Make tool_add endpoint return a list instead of a generator
R65:907c4adb9104: api.server: Add metrics on origin_add and origin_add_one
R65:2f1e763db7e2: api.server: Refactor repetition into a send_metrics function
R65:ce999b589770: api.server: Exploit new endpoint to compute metrics
R65:529c4bef7a94: storage: Fix typo in variable name
R65:cb7b9eea5f7b: api.server: Rename to process_metrics function name
R65:f18af309f9f4: swh.storage: Adapt and unify _add endpoints
R65:b7d9896fa3fe: api.server: Add metrics to origin_visit_add
R65:4518363e7b7f: api.server.process_metrics: Skip processing when value is 0
R65:e357a3862973: api.server: Include the encoding step when timing
R65:bb19ce34c31e: server.api: Fix metric names respecting prometheus conventions [1]
R65:25da407238a0: directory_add: Normalize metrics keys
R65:ab7f0088d4c0: storage*.release_add: Return summary
R65:01b8a9ae22f7: storage.in_memory: Fix missing filtering on release_add
R65:007f3de44b0b: storage*.snapshot_add: Return summary
R65:f36644637be6: storage*.revision_add: Return summary
R65:ddd1835c584d: api.server: Send computed metrics
R65:fb4be03d66a9: storage: Align implementations to compute correct metrics
R65:ecbb6111509a: storage*.directory_add: Return summary
R65:b3414465ed62: api.server.directory_add: Add counter metrics
R65:cd7836491f31: content_add: Normalize the summary keys
R65:71bd5bf250c5: storage.in_memory.content_add: Adapt implementation to send summary
R65:a284ec0d5515: storage: Update tests to check the summary result act as expected
R65:7f798bb8ecb9: api.server: Update docstrings on content_add to explicit the output
R65:ee62f46fca46: api.server: Remove encode step from metrics
R65:3e45179a1937: Install counter metrics
R65:253985814e8a: api.server: Remove timed metric on the get_storage function
rDSTO1a401c0db7d9: api.server: Make send_metric be in charge of the metric parsing
rDSTOe9a3198a3c20: api.server: Add tests around send_metric function
rDSTO7d6d894ad6df: api.server: Use constants for all metrics
rDSTOff055909bcf7: api.server: Rename constant
rDSTO1da7b500140f: api.server: Rename send_metric as we really send only 1 metric
rDSTO04792f87f6ca: server: Add metrics on tool_add endpoint
rDSTO2f1e763db7e2: api.server: Refactor repetition into a send_metrics function
rDSTO6dee5f5de633: api.server: Fix wrong api call
rDSTO9b35b2d63e52: server: Add metrics on origin_metadata_add & metadata_provider_add
rDSTO6064a316fd13: api.{client,server}: Remove dead code about entity
rDSTOee934b6dfaca: storage: Make tool_add endpoint return a list instead of a generator
rDSTO907c4adb9104: api.server: Add metrics on origin_add and origin_add_one
rDSTOb7d9896fa3fe: api.server: Add metrics to origin_visit_add
rDSTOe357a3862973: api.server: Include the encoding step when timing
rDSTObb19ce34c31e: server.api: Fix metric names respecting prometheus conventions [1]
rDSTO4518363e7b7f: api.server.process_metrics: Skip processing when value is 0
rDSTOce999b589770: api.server: Exploit new endpoint to compute metrics
rDSTO529c4bef7a94: storage: Fix typo in variable name
rDSTOcb7b9eea5f7b: api.server: Rename to process_metrics function name
rDSTO007f3de44b0b: storage*.snapshot_add: Return summary
rDSTOddd1835c584d: api.server: Send computed metrics
rDSTOf18af309f9f4: swh.storage: Adapt and unify _add endpoints
rDSTOab7f0088d4c0: storage*.release_add: Return summary
rDSTO01b8a9ae22f7: storage.in_memory: Fix missing filtering on release_add
rDSTOf36644637be6: storage*.revision_add: Return summary
rDSTOcd7836491f31: content_add: Normalize the summary keys
rDSTOfb4be03d66a9: storage: Align implementations to compute correct metrics
rDSTO25da407238a0: directory_add: Normalize metrics keys
rDSTOb3414465ed62: api.server.directory_add: Add counter metrics
rDSTOecbb6111509a: storage*.directory_add: Return summary
rDSTO71bd5bf250c5: storage.in_memory.content_add: Adapt implementation to send summary
rDSTOa284ec0d5515: storage: Update tests to check the summary result act as expected
rDSTOee62f46fca46: api.server: Remove encode step from metrics
rDSTO3e45179a1937: Install counter metrics
rDSTO7f798bb8ecb9: api.server: Update docstrings on content_add to explicit the output
rDSTO253985814e8a: api.server: Remove timed metric on the get_storage function
Summary

Add metrics on _add endpoints

Remain one last bit to check.

Also:

  • api.server: Remove timed metric on the get_storage function

Related T1598#29930

Test Plan

tox

Diff Detail

Repository
rDSTO Storage manager
Branch
install-detailed-metrics
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 5055
Build 6798: tox-on-jenkinsJenkins
Build 6797: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
swh/storage/api/server.py
24

The idea here is to have a mapping per endpoint with the interesting key we want to use as metric.

That might be also used in the storage.py module.

The commented list is because we could need more information that just one bare key.
As i'm already unsure of the initial implementation, i'm going for a simple one first.

40

Implementation detail to have access of the data prior to any Response wrapping the encode_data does.
Again with the discussion about the grpc [1] which could ease our life.

[1] https://grpc.io/docs/guides/#overview

ardumont retitled this revision from POC: Install counter metrics to Install counter metrics: Implementation up for discussion.Mar 29 2019, 6:24 PM
  • api.server: Update docstrings on content_add to explicit the output
  • api.server: Remove encode step from metrics
  • storage: Update tests to check the summary result act as expected
  • storage.in_memory.content_add: Adapt implementation to send summary
  • storage*.directory_add: Return summary
  • api.server.directory_add: Add counter metrics
  • content_add: Normalize the summary keys
  • directory_add: Normalize metrics keys
  • storage: Align implementations to compute correct metrics
  • storage*.revision_add: Return summary
  • storage.in_memory: Fix missing filtering on release_add
  • storage*.release_add: Return summary
  • storage*.snapshot_add: Return summary

Remains to actually exploit the summary appropriately in the
swh.storage.api.server module (in-progress)

Rebase on latest master

Fix flake8 warnings that broke the ci...

sql/upgrades/131.sql
10 ↗(On Diff #4235)

What?

Update my local master who were late. So the previous diff version
contains too much data

ardumont retitled this revision from Install counter metrics: Implementation up for discussion to Install counter metrics.Apr 1 2019, 11:17 PM
ardumont edited the summary of this revision. (Show Details)
swh/storage/api/server.py
57

Right now, that gives something like:

U 127.0.0.1:47068 -> 127.0.0.1:8125
  swh_storage_request_object_count:{'snapshot_added': 1}|c|#endpoint:snapshot_add
  • swh.storage: Adapt and unify _add endpoints
  • api.server: Exploit new endpoint to compute metrics

Adapt according to our latest exchange

Metrics now looks like:

U 127.0.0.1:55331 -> 127.0.0.1:8125
  swh_storage_content_add:3681|c|#endpoint:content_add,object_type:content,operation:add
#
U 127.0.0.1:57664 -> 127.0.0.1:8125
  swh_storage_skipped_content_add:0|c|#endpoint:content_add,object_type:skipped_content,operation:add
#
U 127.0.0.1:33871 -> 127.0.0.1:8125
  swh_storage_revision_add:2267|c|#endpoint:revision_add,object_type:revision,operation:add
ardumont retitled this revision from Install counter metrics to swh-storage: Install counter metrics.Apr 2 2019, 11:24 PM
ardumont added a project: Storage manager.
olasd requested changes to this revision.Apr 2 2019, 11:42 PM

Looks good, thanks! Just a few more changes and this should be all good to merge.

It might make sense to rebase this on top of D1329 when it lands

swh/storage/api/server.py
42

hehe, thanks

49

maybe this would be better if called process_metrics or send_metrics_from_retval?

56

We should skip all of this processing if value is 0

61

The metric name should be constant with respect to what we're putting in the labels : swh_storage_operations_total.

65

In this case we do want the unit in the metric name: "swh_storage_operations_%s_total" % unit. (reference: https://prometheus.io/docs/practices/naming/)

158–159

we should really @time the @encodeing process by swapping the decorators.

swh/storage/storage.py
193–195

There's all kinds of fun considerations when considering e.g. the multiplexing objstorage here...

This revision now requires changes to proceed.Apr 2 2019, 11:42 PM
ardumont added inline comments.
swh/storage/api/server.py
158–159

Ok, i thought we did not want to.
But thinking about it more, that makes sense to indeed.

swh/storage/storage.py
193–195

Well, sure but then, separation of concern, that's the objstorage's job, ain't it?

It's fine to keep that as a readme for now ;)

ardumont marked an inline comment as done.

Adapt according to olasd's review:

  • api.server: Include the encoding step when timing
  • api.server: Rename to process_metrics function name
  • api.server.process_metrics: Skip processing when value is 0
  • server.api: Fix metric names respecting prometheus conventions [1]

Improvments:

  • api.server: Add metrics to origin_visit_add
  • api.server: Add metrics on origin_add and origin_add_one
  • server: Add metrics on origin_metadata_add & metadata_provider_add
  • storage: Make tool_add endpoint return a list instead of a generator
  • server: Add metrics on tool_add endpoint

Unrelated (too lazy to open another diff, tell me if you really want me too):

  • storage: Fix typo in variable name
  • api.{client,server}: Remove dead code about entity

Heads up, will need to be rebased on D1329 when it lands.
This will need adaptation for the snapshot_add endpoints will take a list of snapshots as input (instead of 1).

olasd requested changes to this revision.Apr 3 2019, 3:30 PM

If we're making metric names constants, this should happen for all of them so that they are defined in the same place.

Maybe something like

OPERATIONS_METRIC = "swh_storage_operations_total"
OPERATIONS_UNIT_METRIC = "swh_storage_operations_{unit}_total"
DURATION_METRIC = "swh_storage_request_duration_seconds"

Then fill the pattern when using the metric which uses the unit.

I'm not entirely convinced by the introduction of the send_metrics indirection as is, using the statsd API itself would be much clearer.

To make it more explicit:

  • its name should be singular (it sends a single metric)
  • all its callers should use keyword arguments
  • it would probably make sense to move the dict key parsing/metric name stuff inside that function, which will help if we ever end up adding an "extrinsic" metric that would use a unit instead of a count.

So the calls would end up looking like:

send_metric(metric='origin:add', count=len(origins), method_name='origin_add')

instead of

send_metrics(len(origins), 'origin_add', 'origin', 'add')
This revision now requires changes to proceed.Apr 3 2019, 3:30 PM

If we're making metric names constants, this should happen for all of them so that they are defined in the same place.
... fill the pattern when using the metric which uses the unit.

right, done.

...
So the calls would end up looking like:
send_metric(metric='origin:add', count=len(origins), method_name='origin_add')

heading there

  • api.server: Rename constant
  • api.server: Rename send_metric as we really send only 1 metric
  • api.server: Use constants for all metrics
  • api.server: Make send_metric be in charge of the metric parsing
  • api.server: Start adding test (incomplete)
  • api.server: Add tests around send_metric function
This revision is now accepted and ready to land.Apr 3 2019, 10:12 PM
This revision was automatically updated to reflect the committed changes.