Page MenuHomeSoftware Heritage

Add metrics in store_data on ratios of objects already stored
ClosedPublic

Authored by vlorentz on May 20 2022, 1:47 PM.

Details

Diff Detail

Repository
rDLDG Git loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D7871 (id=28414)

Could not rebase; Attempt merge onto 85a4794094...

Updating 85a4794..78f51af
Fast-forward
 swh/loader/git/base.py              | 123 ++++++++++++++++++++++++++++++++++++
 swh/loader/git/from_disk.py         |   4 +-
 swh/loader/git/loader.py            |   4 +-
 swh/loader/git/tests/test_loader.py |  79 ++++++++++++++++++++++-
 4 files changed, 204 insertions(+), 6 deletions(-)
 create mode 100644 swh/loader/git/base.py
Changes applied before test
commit 78f51af62ad049e1883adca19b91ca100d63e06d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri May 20 13:46:58 2022 +0200

    Add metrics in store_date on ratios of objects already stored

commit 083e1aa18e24cd3311162e259b15c1867b313060
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri May 20 12:16:23 2022 +0200

    Move store_data from DVCSLoader to a new BaseGitLoader
    
    In preparation for the removal of DVCSLoader from swh.loader.core,
    as the git loader is the only one to use it anymore.

Link to build: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/216/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/216/console

Harbormaster returned this revision to the author for changes because remote builds failed.May 20 2022, 1:49 PM
Harbormaster failed remote builds in B29505: Diff 28414!

Build has FAILED

Patch application report for D7871 (id=28415)

Could not rebase; Attempt merge onto 85a4794094...

Updating 85a4794..c145fcb
Fast-forward
 requirements-swh.txt                |   2 +-
 swh/loader/git/base.py              | 123 ++++++++++++++++++++++++++++++++++++
 swh/loader/git/from_disk.py         |   4 +-
 swh/loader/git/loader.py            |   4 +-
 swh/loader/git/tests/test_loader.py |  79 ++++++++++++++++++++++-
 5 files changed, 205 insertions(+), 7 deletions(-)
 create mode 100644 swh/loader/git/base.py
Changes applied before test
commit c145fcb345af490f8dbc4067921b06b839eb4548
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri May 20 13:46:58 2022 +0200

    Add metrics in store_date on ratios of objects already stored

commit 083e1aa18e24cd3311162e259b15c1867b313060
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri May 20 12:16:23 2022 +0200

    Move store_data from DVCSLoader to a new BaseGitLoader
    
    In preparation for the removal of DVCSLoader from swh.loader.core,
    as the git loader is the only one to use it anymore.

Link to build: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/217/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/217/console

anlambert added a subscriber: anlambert.

Looks good to me. There is a typo in commit message though, s/store_date/store_data/.

This revision is now accepted and ready to land.May 20 2022, 2:11 PM
vlorentz retitled this revision from Add metrics in store_date on ratios of objects already stored to Add metrics in store_data on ratios of objects already stored.May 20 2022, 2:17 PM

change title + retrigger CI

Build has FAILED

Patch application report for D7871 (id=28418)

Could not rebase; Attempt merge onto 85a4794094...

Updating 85a4794..f45ca1c
Fast-forward
 requirements-swh.txt                |   2 +-
 swh/loader/git/base.py              | 123 ++++++++++++++++++++++++++++++++++++
 swh/loader/git/from_disk.py         |   4 +-
 swh/loader/git/loader.py            |   4 +-
 swh/loader/git/tests/test_loader.py |  79 ++++++++++++++++++++++-
 5 files changed, 205 insertions(+), 7 deletions(-)
 create mode 100644 swh/loader/git/base.py
Changes applied before test
commit f45ca1c2c0fac2f57cd1b43af984251078c89169
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri May 20 13:46:58 2022 +0200

    Add metrics in store_data on ratios of objects already stored

commit 083e1aa18e24cd3311162e259b15c1867b313060
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri May 20 12:16:23 2022 +0200

    Move store_data from DVCSLoader to a new BaseGitLoader
    
    In preparation for the removal of DVCSLoader from swh.loader.core,
    as the git loader is the only one to use it anymore.

Link to build: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/218/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/218/console

Build is green

Patch application report for D7871 (id=28418)

Could not rebase; Attempt merge onto 85a4794094...

Updating 85a4794..f45ca1c
Fast-forward
 requirements-swh.txt                |   2 +-
 swh/loader/git/base.py              | 123 ++++++++++++++++++++++++++++++++++++
 swh/loader/git/from_disk.py         |   4 +-
 swh/loader/git/loader.py            |   4 +-
 swh/loader/git/tests/test_loader.py |  79 ++++++++++++++++++++++-
 5 files changed, 205 insertions(+), 7 deletions(-)
 create mode 100644 swh/loader/git/base.py
Changes applied before test
commit f45ca1c2c0fac2f57cd1b43af984251078c89169
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri May 20 13:46:58 2022 +0200

    Add metrics in store_data on ratios of objects already stored

commit 083e1aa18e24cd3311162e259b15c1867b313060
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri May 20 12:16:23 2022 +0200

    Move store_data from DVCSLoader to a new BaseGitLoader
    
    In preparation for the removal of DVCSLoader from swh.loader.core,
    as the git loader is the only one to use it anymore.

See https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/219/ for more details.

Thanks for working on this, I think getting these insights will be useful.

Don't take this as a blocking suggestion but rather as an opportunity for generalization: do you think it would be possible to implement these "filtered objects" metrics inside the swh.storage filter proxy, rather than hardcode it only in the git loader? This way all loaders would be able to leverage them.

Even if the statsd poking only happens in the loader (to be able to use the cumulated tags on that statsd instance), I think we could push the collection of the cumulative counts down one layer.