Page MenuHomeSoftware Heritage

Add statsd metrics on incremental loading
ClosedPublic

Authored by vlorentz on May 3 2022, 6:45 PM.

Details

Summary

It will be used to debug why fork detection does not have the expected
speed-up in production.

Diff Detail

Repository
rDLDG Git loader
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 29085
Build 45466: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 45465: arc lint + arc unit

Event Timeline

Build has FAILED

Patch application report for D7732 (id=27959)

Rebasing onto 9b44237005...

First, rewinding head to replay your work on top of it...
Applying: Add statsd metrics on incremental loading
Changes applied before test
commit c92085799fea5fb9ac9addc308fe2a6749bce83d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue May 3 18:45:15 2022 +0200

    Add statsd metrics on incremental loading
    
    It will be used to debug why fork detection does not have the expected
    speed-up in production.

Link to build: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/199/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/199/console

Harbormaster returned this revision to the author for changes because remote builds failed.May 3 2022, 6:48 PM
Harbormaster failed remote builds in B29038: Diff 27959!

wtf? it works on my machine, even in tox

Build has FAILED

Patch application report for D7732 (id=27974)

Rebasing onto 9b44237005...

First, rewinding head to replay your work on top of it...
Applying: Add statsd metrics on incremental loading
Changes applied before test
commit 0492a45ca14250a69456d3649c6c7148fd5ab2b6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue May 3 18:45:15 2022 +0200

    Add statsd metrics on incremental loading
    
    It will be used to debug why fork detection does not have the expected
    speed-up in production.

Link to build: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/200/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/200/console

tests: fix py3.7 support again

Build has FAILED

Patch application report for D7732 (id=27979)

Rebasing onto 9b44237005...

First, rewinding head to replay your work on top of it...
Applying: Add statsd metrics on incremental loading
Changes applied before test
commit fa95573e302370be1c860cbdf110c6e08dd0123c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue May 3 18:45:15 2022 +0200

    Add statsd metrics on incremental loading
    
    It will be used to debug why fork detection does not have the expected
    speed-up in production.

Link to build: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/201/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/201/console

Build has FAILED

Patch application report for D7732 (id=27980)

Rebasing onto 9b44237005...

First, rewinding head to replay your work on top of it...
Applying: Add statsd metrics on incremental loading
Changes applied before test
commit ae6a359f0311b6936fae6bc345881a7472d4c3ad
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue May 3 18:45:15 2022 +0200

    Add statsd metrics on incremental loading
    
    It will be used to debug why fork detection does not have the expected
    speed-up in production.

Link to build: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/202/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/202/console

tests: fix py3.7 support again

Build is green

Patch application report for D7732 (id=27981)

Rebasing onto 9b44237005...

First, rewinding head to replay your work on top of it...
Applying: Add statsd metrics on incremental loading
Changes applied before test
commit 65e8627ea0dd5f20e2ec125fa60037f9e37ea018
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue May 3 18:45:15 2022 +0200

    Add statsd metrics on incremental loading
    
    It will be used to debug why fork detection does not have the expected
    speed-up in production.

See https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/203/ for more details.

Make metric names consistent with the core loader's

Build is green

Patch application report for D7732 (id=28006)

Rebasing onto 9b44237005...

Current branch diff-target is up to date.
Changes applied before test
commit 8bfb630e631e212a9cf5e9efc2621cbd6ca1681d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue May 3 18:45:15 2022 +0200

    Add statsd metrics on incremental loading
    
    It will be used to debug why fork detection does not have the expected
    speed-up in production.

See https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/204/ for more details.

olasd added inline comments.
swh/loader/git/loader.py
39
102–111

We will need these to be configured with specific buckets as well (same as D7727)

125–146

f-strings can't be used as docstrings.

294

As you may have guessed from my comment on D7726, I think this should be set as a tag on a common metric, rather than generate a set of distinct metrics.

And this should be pushed as a "constant" tag to used on further calls to statsd in this loader, without having to bother with passing the list of tags around on all calls, e.g. the ones for the refs ratios.

I also wonder if this should be a couple of tags rather than a single one:

  • incremental_enabled = "True"/"False"
  • (when incremental_enabled == "True") incremental_snapshot_origin="self"/"parent"/"none"

Is it worth distinguishing "has parent, no snapshot found" no_previous_snapshot from "doesn't have parent, no snapshot found" no_parent_origin? If so, maybe we can add a has_parent tag?)

swh/loader/git/loader.py
39

actually not relevant anymore, as I'll use the core's self.statsd

102–111

already fixed, and renamed to _percent as recommended in prometheus' documentation

125–146

aw, sad. anyway, I don't need them anymore now that I'm removing the STATSD_PREFIX.

294

already fixed too, but in a different way. I can do it your way too, if you prefer.

Is it worth distinguishing "has parent, no snapshot found" no_previous_snapshot from "doesn't have parent, no snapshot found" no_parent_origin?

I don't know, but there is no harm in storing it, is there?

re-use self.statsd set by the core

Build has FAILED

Patch application report for D7732 (id=28009)

Rebasing onto 9b44237005...

Current branch diff-target is up to date.
Changes applied before test
commit 4bbe278176c0c56f304a5b3bfab0cf0e374dae42
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue May 3 18:45:15 2022 +0200

    Add statsd metrics on incremental loading
    
    It will be used to debug why fork detection does not have the expected
    speed-up in production.

Link to build: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/205/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/205/console

bump swh-loader-core requirement

Build has FAILED

Patch application report for D7732 (id=28010)

Rebasing onto 9b44237005...

Current branch diff-target is up to date.
Changes applied before test
commit 68fe91204bef874edc1c78c109881c8be17626ed
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue May 3 18:45:15 2022 +0200

    Add statsd metrics on incremental loading
    
    It will be used to debug why fork detection does not have the expected
    speed-up in production.

Link to build: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/206/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/206/console

remove unused STATSD_PREFIX

Build has FAILED

Patch application report for D7732 (id=28011)

Rebasing onto 9b44237005...

Current branch diff-target is up to date.
Changes applied before test
commit 16b23d9df5066a118ceaccdba6905a73ac49b981
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue May 3 18:45:15 2022 +0200

    Add statsd metrics on incremental loading
    
    It will be used to debug why fork detection does not have the expected
    speed-up in production.

Link to build: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/207/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/207/console

Build is green

Patch application report for D7732 (id=28011)

Rebasing onto 9b44237005...

Current branch diff-target is up to date.
Changes applied before test
commit 16b23d9df5066a118ceaccdba6905a73ac49b981
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue May 3 18:45:15 2022 +0200

    Add statsd metrics on incremental loading
    
    It will be used to debug why fork detection does not have the expected
    speed-up in production.

See https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/208/ for more details.

Reorganize tags as suggested; but renamed 'has_parent' to 'has_parent_origins' to be more explicit

Build is green

Patch application report for D7732 (id=28046)

Rebasing onto 9b44237005...

Current branch diff-target is up to date.
Changes applied before test
commit b4581b13eb84c55ddfaa94fc795d5fa63c277c2a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue May 3 18:45:15 2022 +0200

    Add statsd metrics on incremental loading
    
    It will be used to debug why fork detection does not have the expected
    speed-up in production.

See https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/209/ for more details.

Awesome, thanks! (With one last stroke of paint on the bikeshed :D)

swh/loader/git/loader.py
298

This should end with _total to conform with prom naming conventions

This revision is now accepted and ready to land.May 6 2022, 11:40 AM

Build is green

Patch application report for D7732 (id=28052)

Rebasing onto 9b44237005...

Current branch diff-target is up to date.
Changes applied before test
commit 2d4bd789adc49b41d62b97531f2d249033bd7f03
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue May 3 18:45:15 2022 +0200

    Add statsd metrics on incremental loading
    
    It will be used to debug why fork detection does not have the expected
    speed-up in production.

See https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/210/ for more details.