Page MenuHomeSoftware Heritage

loader.core: Add statsd metrics on collected metadata
ClosedPublic

Authored by vlorentz on May 2 2022, 3:22 PM.

Details

Summary

Will be used to check we actually do detect all github forks in production

Depends on D7726.

Event Timeline

Build is green

Patch application report for D7727 (id=27937)

Could not rebase; Attempt merge onto 82e1bfad5c...

Updating 82e1bfa..2a8559e
Fast-forward
 swh/loader/core/loader.py            | 73 +++++++++++++++++++++++++++++++-----
 swh/loader/core/metadata_fetchers.py |  1 +
 swh/loader/core/tests/test_loader.py | 14 ++++---
 3 files changed, 72 insertions(+), 16 deletions(-)
Changes applied before test
commit 2a8559e198f8a0fce10e3cb4eb5fd05c9a3ec564
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 2 15:21:36 2022 +0200

    loader.core: Add statsd metrics on collected metadata

commit 619f8aef0fe93725a254b8978220b328b559a3a3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 2 15:20:52 2022 +0200

    loader.core: Add statsd timing metrics

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/765/ for more details.

Build is green

Patch application report for D7727 (id=27950)

Could not rebase; Attempt merge onto c06305e78a...

Merge made by the 'recursive' strategy.
 swh/loader/core/loader.py            | 89 ++++++++++++++++++++++++++++++++----
 swh/loader/core/metadata_fetchers.py |  1 +
 swh/loader/core/tests/test_loader.py | 87 ++++++++++++++++++++++++++++++++---
 3 files changed, 160 insertions(+), 17 deletions(-)
Changes applied before test
commit d4eb9310b9208c3123f8934a91e8bdb60b0fe870
Merge: c06305e 97628cb
Author: Jenkins user <jenkins@localhost>
Date:   Tue May 3 11:45:41 2022 +0000

    Merge branch 'diff-target' into HEAD

commit 97628cbdb11faa0d59031bf81799d9116182aa6a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 2 15:21:36 2022 +0200

    loader.core: Add statsd metrics on collected metadata

commit af55cf32db9db5a9d4e6d8c0019a8e329697778b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 2 15:20:52 2022 +0200

    loader.core: Add statsd timing metrics

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/770/ for more details.

Build has FAILED

Patch application report for D7727 (id=27952)

Could not rebase; Attempt merge onto c06305e78a...

Merge made by the 'recursive' strategy.
 swh/loader/core/loader.py            |  89 +++++++++++++++++++++++----
 swh/loader/core/metadata_fetchers.py |   1 +
 swh/loader/core/tests/test_loader.py | 113 ++++++++++++++++++++++++++++++++---
 3 files changed, 186 insertions(+), 17 deletions(-)
Changes applied before test
commit e3a7ca1336098de2ee5c26c99ddd51f7160247f1
Merge: c06305e f599982
Author: Jenkins user <jenkins@localhost>
Date:   Tue May 3 12:20:23 2022 +0000

    Merge branch 'diff-target' into HEAD

commit f5999823d0f2d4c1d46342a0e0e4a1e8cb868241
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 2 15:21:36 2022 +0200

    loader.core: Add statsd metrics on collected metadata

commit af55cf32db9db5a9d4e6d8c0019a8e329697778b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 2 15:20:52 2022 +0200

    loader.core: Add statsd timing metrics

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/771/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/771/console

Build is green

Patch application report for D7727 (id=27975)

Could not rebase; Attempt merge onto a097a946c2...

Merge made by the 'recursive' strategy.
 swh/loader/core/loader.py            |  89 +++++++++++++++++++++++----
 swh/loader/core/metadata_fetchers.py |   1 +
 swh/loader/core/tests/test_loader.py | 113 ++++++++++++++++++++++++++++++++---
 3 files changed, 186 insertions(+), 17 deletions(-)
Changes applied before test
commit b68eeb207c1db048f47c574f9c5aa9d0b35c2006
Merge: a097a94 638d52b
Author: Jenkins user <jenkins@localhost>
Date:   Wed May 4 09:43:25 2022 +0000

    Merge branch 'diff-target' into HEAD

commit 638d52b94e7b946d26c61119f3926474872c2060
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 2 15:21:36 2022 +0200

    loader.core: Add statsd metrics on collected metadata

commit af55cf32db9db5a9d4e6d8c0019a8e329697778b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 2 15:20:52 2022 +0200

    loader.core: Add statsd timing metrics

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/773/ for more details.

olasd added inline comments.
swh/loader/core/loader.py
471–475

For all these histograms, we will want to change the configuration of the statsd exporter for this metric to generate a histogram with relevant buckets. Our default buckets are only relevant for timings and will be really wasteful for this (which will pretty much always return 0 or 1)

https://forge.softwareheritage.org/source/puppet-swh-site/browse/production/data/common/common.yaml$3124-3151

https://github.com/prometheus/statsd_exporter#statsd-timers-and-distributions

(something like:

mappings:
 - match: whatever this metric name will end up being
   observer_type: histogram
   histogram_options:
     buckets: [0, 1, 2, 3, 4, 5]

(you will want to test that in docker as well)

492–496

Do we really want a histogram, or just a running total? (note that the histogram will also generate a running total)

Build is green

Patch application report for D7727 (id=28001)

Could not rebase; Attempt merge onto a097a946c2...

Merge made by the 'recursive' strategy.
 swh/loader/core/loader.py            |  89 +++++++++++++++++++++++---
 swh/loader/core/metadata_fetchers.py |   1 +
 swh/loader/core/tests/test_loader.py | 117 ++++++++++++++++++++++++++++++++---
 3 files changed, 190 insertions(+), 17 deletions(-)
Changes applied before test
commit 68afb29ca43dfc895e769b7d371bf78cb6a36870
Merge: a097a94 5e71c6c
Author: Jenkins user <jenkins@localhost>
Date:   Thu May 5 09:51:38 2022 +0000

    Merge branch 'diff-target' into HEAD

commit 5e71c6c20bb2775350780939b6f52c65945d0ab3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 2 15:21:36 2022 +0200

    loader.core: Add statsd metrics on collected metadata

commit f21ade09a7c961e2cb6cf95aa8f9d350dc7a5700
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 2 15:20:52 2022 +0200

    loader.core: Add statsd timing metrics

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/775/ for more details.

swh/loader/core/loader.py
492–496

I don't know; I'm only interested in the average, and an histogram seemed like the only way to do that (ditto above)

swh/loader/core/loader.py
492–496

If you just want an average, then you just need to have a pair of metrics: one that is incremented for each operation (and will be shared across all metrics), and one that is incremented by the number of objects you want to count the average of. You can then divide these two to get a running average.

The prometheus statsd exporter just generates these series automatically for histograms, so that they can be used with the prometheus histogram functions: there's a <foo>_count series for the number of times a value was measured, and a <foo>_sum series for the sum of all <foo> values mesured, in addition to all the "bucket counts" series (which generate one series for the number of counts lower than a given bucket limit).

Either way, in terms of storage, as long as we pinpoint a small set of relevant buckets, it shouldn't matter too much. If you think that we're never going to need the fine-grained metrics (I probably agree), please use counters directly.

export *_sum and *_count instead of histograms.

Build has FAILED

Patch application report for D7727 (id=28004)

Could not rebase; Attempt merge onto a097a946c2...

Merge made by the 'recursive' strategy.
 swh/loader/core/loader.py            |  94 +++++++++++++++++++++---
 swh/loader/core/metadata_fetchers.py |   1 +
 swh/loader/core/tests/test_loader.py | 137 +++++++++++++++++++++++++++++++++--
 3 files changed, 215 insertions(+), 17 deletions(-)
Changes applied before test
commit 0c7f954fb4cd90af2e332020569ea80487f98273
Merge: a097a94 c0b9263
Author: Jenkins user <jenkins@localhost>
Date:   Thu May 5 11:34:43 2022 +0000

    Merge branch 'diff-target' into HEAD

commit c0b92637fabb3ec3db72762a4ec153c5825e6099
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 2 15:21:36 2022 +0200

    loader.core: Add statsd metrics on collected metadata

commit da8793f99d60a148cc39721b8697dfc6a5d81cea
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 2 15:20:52 2022 +0200

    loader.core: Add statsd timing metrics

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/777/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/777/console

swh/loader/core/loader.py
492–496

Done. I kept {name} inside the metric instead of tags; because it doesn't make sense to aggregate them (like in the other diff) IMO

Build has FAILED

Patch application report for D7727 (id=28004)

Could not rebase; Attempt merge onto a097a946c2...

Merge made by the 'recursive' strategy.
 swh/loader/core/loader.py            |  94 +++++++++++++++++++++---
 swh/loader/core/metadata_fetchers.py |   1 +
 swh/loader/core/tests/test_loader.py | 137 +++++++++++++++++++++++++++++++++--
 3 files changed, 215 insertions(+), 17 deletions(-)
Changes applied before test
commit a105648e2442d03115d6bdfca11f7f4c9fb3044e
Merge: a097a94 c0b9263
Author: Jenkins user <jenkins@localhost>
Date:   Thu May 5 11:36:10 2022 +0000

    Merge branch 'diff-target' into HEAD

commit c0b92637fabb3ec3db72762a4ec153c5825e6099
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 2 15:21:36 2022 +0200

    loader.core: Add statsd metrics on collected metadata

commit da8793f99d60a148cc39721b8697dfc6a5d81cea
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 2 15:20:52 2022 +0200

    loader.core: Add statsd timing metrics

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/778/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/778/console

Build is green

Patch application report for D7727 (id=28004)

Could not rebase; Attempt merge onto a097a946c2...

Merge made by the 'recursive' strategy.
 swh/loader/core/loader.py            |  94 +++++++++++++++++++++---
 swh/loader/core/metadata_fetchers.py |   1 +
 swh/loader/core/tests/test_loader.py | 137 +++++++++++++++++++++++++++++++++--
 3 files changed, 215 insertions(+), 17 deletions(-)
Changes applied before test
commit f0c185b240b96030893ed03e836454bde01b4f0e
Merge: a097a94 c0b9263
Author: Jenkins user <jenkins@localhost>
Date:   Thu May 5 11:45:27 2022 +0000

    Merge branch 'diff-target' into HEAD

commit c0b92637fabb3ec3db72762a4ec153c5825e6099
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 2 15:21:36 2022 +0200

    loader.core: Add statsd metrics on collected metadata

commit da8793f99d60a148cc39721b8697dfc6a5d81cea
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 2 15:20:52 2022 +0200

    loader.core: Add statsd timing metrics

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/779/ for more details.

Thanks!

swh/loader/core/loader.py
492–496

Yeah, that's perfect!

This revision is now accepted and ready to land.May 5 2022, 5:48 PM

Build is green

Patch application report for D7727 (id=28038)

Could not rebase; Attempt merge onto a097a946c2...

Updating a097a94..c4b1119
Fast-forward
 swh/loader/core/loader.py            |  94 +++++++++++++++++++++---
 swh/loader/core/metadata_fetchers.py |   1 +
 swh/loader/core/tests/test_loader.py | 137 +++++++++++++++++++++++++++++++++--
 3 files changed, 215 insertions(+), 17 deletions(-)
Changes applied before test
commit c4b1119763eff3e8316cad701a3203004dcba26a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 2 15:21:36 2022 +0200

    loader.core: Add statsd metrics on collected metadata

commit 6ca6d5cf9cefb595ee8f1ad3fcf5decf4008aafb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 2 15:20:52 2022 +0200

    loader.core: Add statsd timing metrics

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/783/ for more details.