Page MenuHomeSoftware Heritage

MonitoringTag
ActivePublic

Members

  • This project does not have any members.
  • View All

Watchers

  • This project does not have any watchers.
  • View All

Details

Description

Related to grafana, prometheus, icinga, sentry, ...

Recent Activity

Yesterday

rdicosmo moved T2912: Next generation archive counters from Pending validation to Done on the Roadmap 2021 board.
Sat, May 8, 11:13 AM · Roadmap 2021, System administration, Monitoring, Web app

Thu, Apr 29

ardumont closed T3280: Separate save code now request updates from the save code now ui as Resolved.
Thu, Apr 29, 11:16 AM · Monitoring, SaveCodeNow
ardumont added a revision to T3280: Separate save code now request updates from the save code now ui: D5641: docker-dev: Refresh save code now statuses periodically.
Thu, Apr 29, 10:52 AM · Monitoring, SaveCodeNow

Wed, Apr 28

rdicosmo added a comment to T2912: Next generation archive counters.

> I also recall now that vincent added a graph [1] recently enough.

This to try and compare a bit the counter approaches together.

So that's still using the old plumbing at least for that part.

[1] https://grafana.softwareheritage.org/goto/BlkwHorMz

Wed, Apr 28, 5:23 PM · Roadmap 2021, System administration, Monitoring, Web app
ardumont added a comment to T2912: Next generation archive counters.

What about the old counter pipeline? Has it been decommissioned already?

I don't think so as I do not recall seeing diffs about clean up.

In any case, it's not part of what's currently deployed (so no risk for
data mangling if that's part the concern).

Wed, Apr 28, 5:12 PM · Roadmap 2021, System administration, Monitoring, Web app

Tue, Apr 27

ardumont added a comment to T3280: Separate save code now request updates from the save code now ui.

I'll keep it open til the docker env is ok as well (see the diff D5615).

Tue, Apr 27, 5:38 PM · Monitoring, SaveCodeNow
ardumont added a comment to T3280: Separate save code now request updates from the save code now ui.

Deployed.

Tue, Apr 27, 3:22 PM · Monitoring, SaveCodeNow
moranegg changed the status of T3128: Improve deposit integration, management and display from Open to Work in Progress.
Tue, Apr 27, 2:54 PM · meta-task, Roadmap 2021, Monitoring, SWORD deposit, Web app
vlorentz removed a project from T3173: Create profiles in keycloack for the deposit-client to view dedicated moderation page: Roadmap 2021.
Tue, Apr 27, 2:12 PM · Monitoring, SWORD deposit, Web app
vlorentz removed a project from T3174: Filter deposit-admin view by deposit client on moderation page: Roadmap 2021.
Tue, Apr 27, 2:12 PM · Monitoring, SWORD deposit, Web app

Mon, Apr 26

ardumont added a revision to T3280: Separate save code now request updates from the save code now ui: D5615: docker: Install save code now refresh status cron.
Mon, Apr 26, 6:18 PM · Monitoring, SaveCodeNow
ardumont added a comment to T2912: Next generation archive counters.

What about the old counter pipeline? Has it been decommissioned already?

Mon, Apr 26, 2:29 PM · Roadmap 2021, System administration, Monitoring, Web app
rdicosmo added a comment to T2912: Next generation archive counters.

Last bits deployed on archive.s.o (including the author counters).

Mon, Apr 26, 1:33 PM · Roadmap 2021, System administration, Monitoring, Web app
ardumont added a comment to T2912: Next generation archive counters.

Last bits deployed on archive.s.o (including the author counters).

Mon, Apr 26, 12:00 PM · Roadmap 2021, System administration, Monitoring, Web app
rdicosmo moved T2912: Next generation archive counters from Work in progress to Pending validation on the Roadmap 2021 board.
Mon, Apr 26, 10:50 AM · Roadmap 2021, System administration, Monitoring, Web app

Sat, Apr 24

ardumont added a comment to T3251: Count authors from revisions and releases.

Hear hear, it's kept up now:

ardumont@counters1:~% date;redis-cli pfcount person
Sat 24 Apr 2021 05:31:18 PM UTC
(integer) 42190221
Sat, Apr 24, 7:33 PM · Monitoring, Web app

Fri, Apr 23

vsellier added a revision to T2912: Next generation archive counters: D5588: Activate swh-counters on all the webapps.
Fri, Apr 23, 4:26 PM · Roadmap 2021, System administration, Monitoring, Web app
vsellier claimed T3129: Reliable monitoring of services: for users and for admins .
Fri, Apr 23, 3:13 PM · Roadmap 2021, Monitoring, meta-task
vsellier closed T3251: Count authors from revisions and releases, a subtask of T2912: Next generation archive counters, as Resolved.
Fri, Apr 23, 1:03 PM · Roadmap 2021, System administration, Monitoring, Web app
vsellier closed T3251: Count authors from revisions and releases as Resolved.

and the authors are now displayed on staging and production (webapp1)

Fri, Apr 23, 1:03 PM · Monitoring, Web app
vsellier added a comment to T3251: Count authors from revisions and releases.

The lag for the production can be followed here: https://grafana.softwareheritage.org/goto/Di2H3z9Gk
(staging has already recovered)

Fri, Apr 23, 12:57 PM · Monitoring, Web app
vsellier added a comment to T3251: Count authors from revisions and releases.

the swh-counters is deployed in production too:

  • upgrade swh-counters package and restart swh-counters backend and journal
root@counters1:~# apt dist-upgrade
...
Setting up python3-swh.counters (0.7.0-1~swh1~bpo10+1) ...
root@counters1:~# systemctl stop swh-counters-journal-client.service 
root@counters1:~# systemctl restart gunicorn-swh-counters.service 
root@counters1:~# systemctl start swh-counters-journal-client.service 
root@counters1:~# redis-cli pfcount person
(integer) 7

The count of the person already starts

  • stopping the journal-client to be able to reset the releases and revisions offsets
root@counters1:~# systemctl stop swh-counters-journal-client.service
  • reset the offsets
vsellier@kafka1 ~ % /opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server $SERVER --reset-offsets --all-topics --to-current --dry-run  --export --group swh.counters.journal_client 2>&1 > ~/counters_journal_client_offsets.csv
# revision reset
vsellier@kafka1 ~ % 
 /opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server $SERVER --reset-offsets  --group swh.counters.journal_client --to-earliest --execute --topic swh.journal.objects.revision
# release reset
vsellier@kafka1 ~ %  /opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server $SERVER --reset-offsets  --group swh.counters.journal_client --to-earliest --execute --topic swh.journal.objects.release 
# checks
/opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server $SERVER --reset-offsets --all-topics --to-current --dry-run  --export --group swh.counters.journal_client 2>&1 > ~/counters_journal_client_offsets-backfill.csv 
vsellier@kafka1 ~ % diff ~/counters_journal_client_offsets.csv ~/counters_journal_client_offsets-backfill.csv | less 
1c1
< "swh.journal.objects.revision",25,8275180
---
> "swh.journal.objects.revision",25,0
8c8
< "swh.journal.objects.release",128,78484
---
> "swh.journal.objects.release",128,0
16c16
...
  • journal client restarted
root@counters1:~# systemctl start swh-counters-journal-client.service
  • the person counters is growing fastly
root@counters1:~# date;redis-cli pfcount person
Fri 23 Apr 2021 10:55:54 AM UTC
(integer) 72358
root@counters1:~# date;redis-cli pfcount person
Fri 23 Apr 2021 10:55:57 AM UTC
(integer) 80618
Fri, Apr 23, 12:56 PM · Monitoring, Web app
vsellier added a revision to T3251: Count authors from revisions and releases: D5586: Activate the person's counter on the home page with swh-counters.
Fri, Apr 23, 12:03 PM · Monitoring, Web app
ardumont added a comment to T3251: Count authors from revisions and releases.

Also [1] to follow through the journal client consumption (it has data now ;)

Fri, Apr 23, 11:23 AM · Monitoring, Web app
ardumont added a comment to T3251: Count authors from revisions and releases.

I think you can close D5573 which is obsolete now with the latest change.

Fri, Apr 23, 11:22 AM · Monitoring, Web app
vsellier added a comment to T3251: Count authors from revisions and releases.
  • version 0.7.0 release with the last improvement (D5576) of vlorentz (thanks)
  • deployment done in staging
  • the person counting has started on the live messages:
root@counters0:~# redis-cli
127.0.0.1:6379> pfcount person
(integer) 7
  • now let reset the consumer offsets for the release and revision topics to backfill the person counter:
# offsets backup
/opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server $SERVER --reset-offsets --all-topics --to-current --dry-run  --export --group swh.counters.journal_client 2>&1 > ~/counters_journal_client_offsets.csv
# revision reset
 /opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server $SERVER --reset-offsets  --group swh.counters.journal_client --to-earliest --execute --topic swh.journal.objects.revision
# release reset
 /opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server $SERVER --reset-offsets  --group swh.counters.journal_client --to-earliest --execute --topic swh.journal.objects.release
Fri, Apr 23, 11:16 AM · Monitoring, Web app
ardumont added a revision to T3280: Separate save code now request updates from the save code now ui: D5583: Separate save code now status refresh routine from the listing ui.
Fri, Apr 23, 10:28 AM · Monitoring, SaveCodeNow

Thu, Apr 22

ardumont updated the task description for T3280: Separate save code now request updates from the save code now ui.
Thu, Apr 22, 5:10 PM · Monitoring, SaveCodeNow
ardumont claimed T3280: Separate save code now request updates from the save code now ui.
Thu, Apr 22, 5:07 PM · Monitoring, SaveCodeNow
vsellier removed a project from T3165: Generate historical data from the new counters series: Roadmap 2021.
Thu, Apr 22, 4:25 PM · System administration, Monitoring
vsellier added a revision to T3251: Count authors from revisions and releases: D5573: Update the counters' journal clients configuration to count the persons.
Thu, Apr 22, 12:08 PM · Monitoring, Web app
vsellier added a revision to T3251: Count authors from revisions and releases: D5572: Implement the jounal client counting an internal property of an object.
Thu, Apr 22, 10:36 AM · Monitoring, Web app
ardumont moved T2117: Save Code Now: End to End monitoring from Backlog to Done on the Roadmap 2021 board.
Thu, Apr 22, 9:20 AM · System administration, Monitoring, Roadmap 2021

Wed, Apr 21

ardumont moved T2727: Investigate end-to-end monitoring which no longer reports issues from deployed/landed to done on the System administration board.
Wed, Apr 21, 6:58 PM · Monitoring, System administration
ardumont moved T2770: Fix all icinga checks on staging webapp from deployed/landed to done on the System administration board.
Wed, Apr 21, 6:57 PM · Monitoring, System administration, Staging environment
ardumont moved T2117: Save Code Now: End to End monitoring from deployed/landed to done on the System administration board.
Wed, Apr 21, 6:57 PM · System administration, Monitoring, Roadmap 2021
ardumont closed T2117: Save Code Now: End to End monitoring as Resolved.

All checks green both for production/staging and hg/svn/git,

Wed, Apr 21, 6:57 PM · System administration, Monitoring, Roadmap 2021
ardumont added a comment to T3263: Save code now report error for svn type.

Nevertheless, that error should have been detected on the first loading so some adaptation is needed in the svn loader.

Wed, Apr 21, 10:08 AM · SaveCodeNow, Monitoring, SVN Loader
ardumont added a comment to T3263: Save code now report error for svn type.

In any case, independently from this, for the monitoring, I was set on modifying the
actual svn origin used to something else not hosted on github.

Wed, Apr 21, 10:08 AM · SaveCodeNow, Monitoring, SVN Loader
ardumont added a comment to T2117: Save Code Now: End to End monitoring.

On a related note, it may be useful to regularly report requests that did not complete (either as success or failure) in a reasonable amount of time after being scheduled.

Wed, Apr 21, 9:40 AM · System administration, Monitoring, Roadmap 2021
ardumont added a revision to T2117: Save Code Now: End to End monitoring: D5568: Update save code now monitoring checks.
Wed, Apr 21, 9:28 AM · System administration, Monitoring, Roadmap 2021
ardumont added a revision to T3263: Save code now report error for svn type: D5568: Update save code now monitoring checks.
Wed, Apr 21, 9:28 AM · SaveCodeNow, Monitoring, SVN Loader

Tue, Apr 20

ardumont renamed T3280: Separate save code now request updates from the save code now ui from Uncorellate save code now request updates from the save code now ui to Separate save code now request updates from the save code now ui.
Tue, Apr 20, 3:10 PM · Monitoring, SaveCodeNow
ardumont triaged T3280: Separate save code now request updates from the save code now ui as Normal priority.
Tue, Apr 20, 2:54 PM · Monitoring, SaveCodeNow

Mon, Apr 19

vsellier changed the status of T3251: Count authors from revisions and releases, a subtask of T2912: Next generation archive counters, from Open to Work in Progress.
Mon, Apr 19, 3:52 PM · Roadmap 2021, System administration, Monitoring, Web app
vsellier changed the status of T3251: Count authors from revisions and releases from Open to Work in Progress.
Mon, Apr 19, 3:52 PM · Monitoring, Web app
ardumont added a comment to T3263: Save code now report error for svn type.

Thanks for the heads up ;)

Mon, Apr 19, 12:32 PM · SaveCodeNow, Monitoring, SVN Loader
anlambert added a comment to T3263: Save code now report error for svn type.

The error is related to that github origin erroneously submitted with a svnvisit type.

Mon, Apr 19, 12:12 PM · SaveCodeNow, Monitoring, SVN Loader
ardumont added a project to T3263: Save code now report error for svn type: SaveCodeNow.
Mon, Apr 19, 9:37 AM · SaveCodeNow, Monitoring, SVN Loader
ardumont updated the task description for T3263: Save code now report error for svn type.
Mon, Apr 19, 9:22 AM · SaveCodeNow, Monitoring, SVN Loader