Page MenuHomeSoftware Heritage

Web appFolder
ActivePublic

Members

  • This project does not have any members.
  • View All

Details

Recent Activity

Today

olasd added a comment to T2828: Archive counters are no longer updated in production.

Thanks for the clarification.
I missed those counters, I was only focused on the sql_swh_archive_object_count metrics. Could you give some pointers or information on how it's called ? I can only found the stored procedure declaration on storage [1].

My understanding of the "Objects added by time period dashboard" is it uses the sql_swh_archive_object_count prometheus metrics.

Tue, Dec 1, 2:59 PM · Monitoring, Web app, System administration
vsellier added a comment to T2828: Archive counters are no longer updated in production.

Thanks for the clarification.
I missed those counters, I was only focused on the sql_swh_archive_object_count metrics. Could you give some pointers or information on how it's called ? I can only found the stored procedure declaration on storage [1].

Tue, Dec 1, 2:47 PM · Monitoring, Web app, System administration
olasd added a comment to T2828: Archive counters are no longer updated in production.

Erratum: the counters are not yet visible on the "Object added by time period" dashboard due to the aggregation per day.

Tue, Dec 1, 1:10 PM · Monitoring, Web app, System administration
vsellier added a comment to T2828: Archive counters are no longer updated in production.

All stopped workers are restarted :

vsellier@pergamon ~ % sudo clush -b -w @swh-workers16 'puppet agent --enable; cd /etc/systemd/system/; for unit in swh-worker@*.service; do systemctl enable $unit; done; systemctl start swh-worker@*'
Tue, Dec 1, 12:48 PM · Monitoring, Web app, System administration
vsellier added a comment to T2828: Archive counters are no longer updated in production.

Erratum: the counters are not yet visible on the "Object added by time period" dashboard due to the aggregation per day.

Tue, Dec 1, 12:26 PM · Monitoring, Web app, System administration
vsellier added a comment to T2828: Archive counters are no longer updated in production.

Thanks for looking into this. It would be great to make sure that statistics are collected only once at a time (every X hours), and cached, so to avoid rerunning expensive queries regularly.

Tue, Dec 1, 12:22 PM · Monitoring, Web app, System administration
vsellier added a comment to T2828: Archive counters are no longer updated in production.

The postgresql statistics come back online [1].
The "Object added by time period" dashboard[2] has also data to display

Tue, Dec 1, 12:06 PM · Monitoring, Web app, System administration
rdicosmo added a comment to T2828: Archive counters are no longer updated in production.

Thanks for looking into this. It would be great to make sure that statistics are collected only once at a time (every X hours), and cached, so to avoid rerunning expensive queries regularly.

Tue, Dec 1, 11:55 AM · Monitoring, Web app, System administration
vsellier added a comment to T2828: Archive counters are no longer updated in production.

D4635 is landed.

Tue, Dec 1, 11:48 AM · Monitoring, Web app, System administration
vsellier added a comment to T2828: Archive counters are no longer updated in production.

As the slowness of the monitoring requests doesn't seem to be related to the direct load on the database, the indexers were restarted :

vsellier@pergamon ~ % sudo clush -b -w @azure-workers 'cd /etc/systemd/system/multi-user.target.wants; for unit in "swh-worker@indexer_origin_intrinsic_metadata.service swh-worker@indexer_fossology_license.service swh-worker@indexer_content_mimetype.service"; do systemctl enable $unit; done; systemctl start swh-worker@indexer_origin_intrinsic_metadata.service swh-worker@indexer_fossology_license.service swh-worker@indexer_content_mimetype.service; puppet agent --enable'
Tue, Dec 1, 9:37 AM · Monitoring, Web app, System administration
vsellier added a comment to T2828: Archive counters are no longer updated in production.

D4636 is a proposal to solve the performance issues on the statistic queries

Tue, Dec 1, 9:22 AM · Monitoring, Web app, System administration
vsellier added a revision to T2828: Archive counters are no longer updated in production: D4635: exclude temporary schemas from the statistics.
Tue, Dec 1, 9:21 AM · Monitoring, Web app, System administration

Yesterday

vsellier updated subscribers of T2828: Archive counters are no longer updated in production.

@olasd has stopped the backfilling with :

pkill -2 -u swhstorage -f revision

(allow to flush the logs before exiting)

Mon, Nov 30, 7:49 PM · Monitoring, Web app, System administration
vsellier added a comment to T2828: Archive counters are no longer updated in production.

Half of the workers where stopped :

root@pergamon:~# sudo clush -b -w @swh-workers16 'puppet agent --disable "Reduce load of belvedere"; cd /etc/systemd/system/multi-user.target.wants; for unit in swh-worker@*; do systemctl disable $unit; done; systemctl stop swh-worker@*'
worker12: Removed /etc/systemd/system/multi-user.target.wants/swh-worker@checker_deposit.service.
worker11: Removed /etc/systemd/system/multi-user.target.wants/swh-worker@checker_deposit.service.
worker10: Removed /etc/systemd/system/multi-user.target.wants/swh-worker@checker_deposit.service.
worker09: Removed /etc/systemd/system/multi-user.target.wants/swh-worker@checker_deposit.service.
worker12: Removed /etc/systemd/system/multi-user.target.wants/swh-worker@lister.service.
...
Mon, Nov 30, 6:09 PM · Monitoring, Web app, System administration
anlambert added a revision to T2402: make status.s.o status discoverable from archive.s.o: D4634: templates/layout: Add status widget in top bar.
Mon, Nov 30, 6:01 PM · System administration, Web app
vsellier added a comment to T2828: Archive counters are no longer updated in production.

It seems there is no other solution then reducing the load on belvedere.
There is an aggressive backfill in progress from getty(192.168.100.102) :

postgres=# select client_addr, count(datid) from pg_stat_activity where state != 'idle' group by client_addr;
select client_addr, count(datid) from pg_stat_activity where state != 'idle' group by client_addr;
   client_addr   | count 
-----------------+-------
                 |     3
 192.168.100.18  |     0
 ::1             |     1
 192.168.100.210 |    60
 192.168.100.102 |    64
(5 rows)

I don't want to kill the job running since several day (2020-11-27) to avoid losing any work, The temporary solution is to reduce the number of workers to relieve the load on belvedere

Mon, Nov 30, 5:36 PM · Monitoring, Web app, System administration
vsellier added a comment to T2828: Archive counters are no longer updated in production.

Hmmm... there is definitely no need to update the counters more than once a day

Mon, Nov 30, 5:19 PM · Monitoring, Web app, System administration
vsellier added a comment to T2828: Archive counters are no longer updated in production.

Let's try a temporary workaround :

root@belvedere:/etc/prometheus-sql-exporter# puppet agent --disable "Diagnose prometheus-exporter timeout" 
root@belvedere:/etc/prometheus-sql-exporter# mv swh-scheduler.yml ~
root@belvedere:/etc/prometheus-sql-exporter# systemctl restart prometheus-sql-exporter
Mon, Nov 30, 4:18 PM · Monitoring, Web app, System administration
rdicosmo added a comment to T2828: Archive counters are no longer updated in production.

Hmmm... there is definitely no need to update the counters more than once a day

Mon, Nov 30, 4:14 PM · Monitoring, Web app, System administration
vsellier added a comment to T2828: Archive counters are no longer updated in production.

It seems a queries are executed on the database each time the metrics are requested.
This one is too long (on the swh-scheduler instance):

Mon, Nov 30, 4:04 PM · Monitoring, Web app, System administration
zack renamed T2828: Archive counters are no longer updated in production from Production counters not up to date to Archive counters are no longer updated in production.
Mon, Nov 30, 4:02 PM · Monitoring, Web app, System administration
zack raised the priority of T2828: Archive counters are no longer updated in production from High to Unbreak Now!.
Mon, Nov 30, 4:02 PM · Monitoring, Web app, System administration
vsellier added a comment to T2828: Archive counters are no longer updated in production.

After retracing the counter computation pipeline, it seems they are computed from the values stored on prometheus.

Mon, Nov 30, 3:34 PM · Monitoring, Web app, System administration
vsellier changed the status of T2828: Archive counters are no longer updated in production from Open to Work in Progress.
Mon, Nov 30, 3:20 PM · Monitoring, Web app, System administration

Thu, Nov 26

anlambert added a revision to T2782: add a "Filter Pull Requests" checkbox (or similar) in the Branches view of an origin in the web UI : D4616: common/archive: Add branch names filtering support in lookup_snapshot.
Thu, Nov 26, 6:18 PM · Web app
anlambert added a revision to T2782: add a "Filter Pull Requests" checkbox (or similar) in the Branches view of an origin in the web UI : D4615: storage: Add branch names filtering support in snapshot_get_branches.
Thu, Nov 26, 6:16 PM · Web app

Wed, Nov 25

anlambert closed T2810: API endpoint /vault/directory/<dirhash>/ should not be cached by varnish as Resolved by committing rDWAPPS8492a4c688db: api: Fix endpoint responses that must not be cached.
Wed, Nov 25, 4:24 PM · Web app, System administration
anlambert added a comment to T2810: API endpoint /vault/directory/<dirhash>/ should not be cached by varnish.

There is clearly a regression at the Varnish level as vault cooking progress was correctly reported in the Web UI before.

I am wondering if it could be a side effect of the recent setting of CORS headers for the api paths.

Wed, Nov 25, 4:10 PM · Web app, System administration
anlambert added a revision to T2810: API endpoint /vault/directory/<dirhash>/ should not be cached by varnish: D4595: api: Fix endpoint responses that must not be cached.
Wed, Nov 25, 4:08 PM · Web app, System administration
anlambert added a comment to T2810: API endpoint /vault/directory/<dirhash>/ should not be cached by varnish.

There is clearly a regression at the Varnish level as vault cooking progress was correctly reported in the Web UI before.

Wed, Nov 25, 11:52 AM · Web app, System administration
anlambert added a comment to T2810: API endpoint /vault/directory/<dirhash>/ should not be cached by varnish.

This may be better suited to a fix in the web API, via proper cache config headers, as @olasd mentioned on IRC (probably via https://docs.djangoproject.com/en/3.1/topics/cache/#downstream-caches )

Wed, Nov 25, 11:07 AM · Web app, System administration
douardda updated subscribers of T2810: API endpoint /vault/directory/<dirhash>/ should not be cached by varnish.

This may be better suited to a fix in the web API, via proper cache config headers, as @olasd mentioned on IRC (probably via https://docs.djangoproject.com/en/3.1/topics/cache/#downstream-caches )

Wed, Nov 25, 10:42 AM · Web app, System administration
douardda added a project to T2810: API endpoint /vault/directory/<dirhash>/ should not be cached by varnish: Web app.
Wed, Nov 25, 10:41 AM · Web app, System administration

Fri, Nov 20

seirl triaged T2801: Wrong <title> on snapshot pages as Normal priority.
Fri, Nov 20, 9:01 PM · Web app, Easy hack

Tue, Nov 17

anlambert added a comment to T1725: Software Heritage name not displayed completely in web app.

Related to T2457

Tue, Nov 17, 5:04 PM · Web app
anlambert closed T1477: Improve swh-web design for mobile browsing as Resolved.

Implemented during GSOC 2019, closing this.

Tue, Nov 17, 5:03 PM · GSoC 2019, Web app
anlambert closed T1768: Add end to end tests for the frontend part of swh-web as Resolved.

Let's close that task when we reach 80% of code coverage

Tue, Nov 17, 5:01 PM · GSoC 2019, Web app
anlambert closed T2192: UX improvements as Resolved.

UX improvements from the audit conducted by Juliette have been implemented and deployed so closing this as resolved.

Tue, Nov 17, 3:15 PM · Web app, Restricted Project
anlambert closed T2192: UX improvements, a subtask of T2190: Archive Navigation (Web UI), as Resolved.
Tue, Nov 17, 3:15 PM · Web app, meta-task, Restricted Project
anlambert removed a subtask for T2192: UX improvements: T2327: Review UX proposition for specific use-cases.
Tue, Nov 17, 3:13 PM · Web app, Restricted Project
anlambert added a subtask for T2190: Archive Navigation (Web UI): T2327: Review UX proposition for specific use-cases.
Tue, Nov 17, 3:13 PM · Web app, meta-task, Restricted Project
anlambert edited parent tasks for T2327: Review UX proposition for specific use-cases, added: T2190: Archive Navigation (Web UI); removed: T2192: UX improvements.
Tue, Nov 17, 3:13 PM · Web app
anlambert removed a project from T2327: Review UX proposition for specific use-cases: Restricted Project.
Tue, Nov 17, 3:13 PM · Web app
anlambert closed T2283: "Vault Status" page doesn't get updated when asking for email notifications as Invalid.

Closing this as the observed issue could not be reproduced. I just checked again and everything works as expected plus the vault UI is fully tested with cypress.

Tue, Nov 17, 3:12 PM · Web app
anlambert closed T1927: Web app: rate limiting based on per-client API tokens as Resolved.

This is now implemented, deployed and documented so closing this as resolved.

Tue, Nov 17, 2:56 PM · Web app
anlambert closed T1927: Web app: rate limiting based on per-client API tokens, a subtask of T1982: Add user authentication and permissions to swh-web, as Resolved.
Tue, Nov 17, 2:56 PM · Web app
anlambert added a comment to T2786: UI: wrong usage of the "go to origin" icon next to the origin URL that redirect to SWH.

Another point I forgot to mention here: the origin URL link at the top of the origin context views enable to quickly get back to the HEAD branch root directory for the last visit performed by SWH on the origin.

Tue, Nov 17, 2:52 PM · Web app
anlambert added a comment to T2786: UI: wrong usage of the "go to origin" icon next to the origin URL that redirect to SWH.

For the record, we have the same issue in the "Save code now" requests table.
Originally, the link to the archived origin was available from the "Status" column but I was asked to change it as it was not prominente enough.

Tue, Nov 17, 12:15 PM · Web app
douardda added a comment to T2786: UI: wrong usage of the "go to origin" icon next to the origin URL that redirect to SWH.

We need to think about a better UI for this, but I have no solution for now.

Tue, Nov 17, 11:53 AM · Web app
anlambert added a comment to T2786: UI: wrong usage of the "go to origin" icon next to the origin URL that redirect to SWH.

That link is not really needed anymore as we can get back to the root directory of the master branch by clicking on the snapshot date link.

Tue, Nov 17, 11:50 AM · Web app