Page MenuHomeSoftware Heritage
Feed Advanced Search

Apr 9 2021

vsellier closed T3165: Generate historical data from the new counters series, a subtask of T2912: Next generation archive counters, as Resolved.
Apr 9 2021, 7:02 PM · Roadmap 2021, System administration, Monitoring, Web app
vsellier closed T3165: Generate historical data from the new counters series as Resolved.

Everything is released correctly and deployed on staging

Apr 9 2021, 7:02 PM · System administration, Monitoring
vsellier closed T3215: Deploy the new counters in staging, a subtask of T2912: Next generation archive counters, as Resolved.
Apr 9 2021, 6:56 PM · Roadmap 2021, System administration, Monitoring, Web app
vsellier closed T3215: Deploy the new counters in staging as Resolved.

I finally found why the graphs looks weird : https://forge.softwareheritage.org/source/swh-web/browse/master/swh/web/misc/urls.py$31
With a dirty patch on the server, it's way better:

Apr 9 2021, 6:56 PM · System administration, Monitoring, Web app
vlorentz added a comment to T3215: Deploy the new counters in staging.

Apr 9 2021, 2:41 PM · System administration, Monitoring, Web app
ardumont added a comment to T3215: Deploy the new counters in staging.

\o/

Apr 9 2021, 2:34 PM · System administration, Monitoring, Web app
vsellier added a comment to T3215: Deploy the new counters in staging.

The pipeline is deployed in staging.
It's working but it seems the graphs need some initial values in staging to make the rendering correctly:

Apr 9 2021, 12:48 PM · System administration, Monitoring, Web app
vsellier added a revision to T3215: Deploy the new counters in staging: D5470: staging: configure counters history pipeline.
Apr 9 2021, 9:47 AM · System administration, Monitoring, Web app

Apr 8 2021

vsellier added a revision to T3165: Generate historical data from the new counters series: D5468: Let flask manage json response by itself.
Apr 8 2021, 7:24 PM · System administration, Monitoring
vlorentz closed T2540: support the loading of metadata-only deposits in the metadata storage, a subtask of T3128: Improve deposit integration, management and display, as Resolved.
Apr 8 2021, 10:57 AM · meta-task, Roadmap 2021, Monitoring, SWORD deposit, Web app
vsellier added a revision to T3165: Generate historical data from the new counters series: D5447: attempt to fix the stable debian build.
Apr 8 2021, 9:07 AM · System administration, Monitoring

Apr 7 2021

vsellier reopened T3165: Generate historical data from the new counters series, a subtask of T2912: Next generation archive counters, as Work in Progress.
Apr 7 2021, 5:42 PM · Roadmap 2021, System administration, Monitoring, Web app
vsellier reopened T3165: Generate historical data from the new counters series as "Work in Progress".

Reopening as the release is not working on the stable branch

Apr 7 2021, 5:42 PM · System administration, Monitoring
vsellier changed the status of T3215: Deploy the new counters in staging from Open to Work in Progress.
Apr 7 2021, 5:14 PM · System administration, Monitoring, Web app
vsellier closed T3165: Generate historical data from the new counters series as Resolved.
Apr 7 2021, 5:13 PM · System administration, Monitoring
vsellier closed T3165: Generate historical data from the new counters series, a subtask of T2912: Next generation archive counters, as Resolved.
Apr 7 2021, 5:13 PM · Roadmap 2021, System administration, Monitoring, Web app
vsellier added a revision to T3165: Generate historical data from the new counters series: D5444: Use an intermediate temporary file to generate the historical data.
Apr 7 2021, 4:52 PM · System administration, Monitoring
vsellier added a revision to T3165: Generate historical data from the new counters series: D5442: Allow the webapp to retrieve the history file via a GET endpoint.
Apr 7 2021, 3:10 PM · System administration, Monitoring

Apr 6 2021

vsellier added a revision to T3165: Generate historical data from the new counters series: D5429: Manage and expose the historical data.
Apr 6 2021, 4:25 PM · System administration, Monitoring
vsellier added a revision to T3165: Generate historical data from the new counters series: D5428: Allow to use several backends with a RPCServerApp.
Apr 6 2021, 4:09 PM · System administration, Monitoring

Apr 5 2021

rdicosmo assigned T3128: Improve deposit integration, management and display to moranegg.
Apr 5 2021, 12:13 PM · meta-task, Roadmap 2021, Monitoring, SWORD deposit, Web app
rdicosmo updated subscribers of T3128: Improve deposit integration, management and display.
Apr 5 2021, 12:13 PM · meta-task, Roadmap 2021, Monitoring, SWORD deposit, Web app
rdicosmo added subtasks for T3128: Improve deposit integration, management and display: T2344: Build a connector for software deposit via Zenodo/InvenioRDM, T2540: support the loading of metadata-only deposits in the metadata storage.
Apr 5 2021, 12:13 PM · meta-task, Roadmap 2021, Monitoring, SWORD deposit, Web app
rdicosmo renamed T3128: Improve deposit integration, management and display from Improve deposit management and display to Improve deposit integration, management and display.
Apr 5 2021, 12:12 PM · meta-task, Roadmap 2021, Monitoring, SWORD deposit, Web app

Apr 1 2021

vsellier closed T3190: counters: Error during directory topic ingestion, a subtask of T2912: Next generation archive counters, as Resolved.
Apr 1 2021, 2:18 PM · Roadmap 2021, System administration, Monitoring, Web app
vsellier closed T3190: counters: Error during directory topic ingestion as Resolved.
Apr 1 2021, 2:18 PM · System administration, Monitoring
vsellier added a revision to T3190: counters: Error during directory topic ingestion: D5399: counters: allow to consume big messages of the directory topic.
Apr 1 2021, 12:37 PM · System administration, Monitoring
vsellier added a comment to T3190: counters: Error during directory topic ingestion.

An improvment of the journal client is necessary to add the support of this configuration like for the producer:

Do you need such improvment though? According to the code you linked, you could pass a
producer_config dict with that key and value.

Apr 1 2021, 12:31 PM · System administration, Monitoring
vlorentz added a comment to T3190: counters: Error during directory topic ingestion.

@ardumont the linked code is for producer, the bug is in consumers

Apr 1 2021, 11:52 AM · System administration, Monitoring
ardumont added a comment to T3190: counters: Error during directory topic ingestion.

An improvment of the journal client is necessary to add the support of this configuration like for the producer:

Apr 1 2021, 11:49 AM · System administration, Monitoring
vsellier added a comment to T3190: counters: Error during directory topic ingestion.

It seems the problem is not present anymore with a higher max message size ('500 * 1024 * 1024').

Apr 1 2021, 11:35 AM · System administration, Monitoring
vlorentz added a comment to T3190: counters: Error during directory topic ingestion.

What version of zstd is installed on the system? that bug was introduced in v1.4.5 (https://github.com/facebook/zstd/commit/718f00ff6fe42db7e6ba09a7f7992b3e85283f77), and fixed in v1.4.7 (https://github.com/facebook/zstd/pull/2272/commits/1302f8d67691356b2f0aec8c62c1a7af2886a7cc)

Apr 1 2021, 10:45 AM · System administration, Monitoring
vlorentz added a comment to T3190: counters: Error during directory topic ingestion.

This looks like https://github.com/edenhill/librdkafka/issues/2672 , which is caused by a bug in zstd: https://github.com/facebook/zstd/issues/2222

Apr 1 2021, 10:41 AM · System administration, Monitoring
vsellier added a comment to T3190: counters: Error during directory topic ingestion.

for the record, increasing the property message.max.bytes to 100 * 1024 * 1024 in the consumer configuration is not solving the problem

Apr 1 2021, 10:32 AM · System administration, Monitoring
vsellier added a comment to T3190: counters: Error during directory topic ingestion.

The same problem occured during the poc, theses messages were ignored by using this consumer configuration "errors.tolerance": 'all' [1].
I will try to find if there is a more elegant way to deal with this issue ;)

Apr 1 2021, 10:03 AM · System administration, Monitoring
vsellier updated the task description for T3190: counters: Error during directory topic ingestion.
Apr 1 2021, 9:46 AM · System administration, Monitoring
vsellier changed the status of T3190: counters: Error during directory topic ingestion from Open to Work in Progress.
Apr 1 2021, 9:38 AM · System administration, Monitoring

Mar 31 2021

zack moved T3175: Prepare production environment from Backlog to Done on the Roadmap 2021 board.
Mar 31 2021, 11:05 AM · Roadmap 2021, System administration, Monitoring

Mar 26 2021

vsellier added a comment to T3165: Generate historical data from the new counters series.

The final counters architecture looks like this with this improvment:

Mar 26 2021, 12:38 PM · System administration, Monitoring
ardumont added a comment to T3165: Generate historical data from the new counters series.

Great idea.

Mar 26 2021, 11:42 AM · System administration, Monitoring
vsellier added a comment to T3165: Generate historical data from the new counters series.

An improvment idea came to me during the refactoring, the script can be splitted and integrated in the 'swh-counters' codebase.

Mar 26 2021, 11:40 AM · System administration, Monitoring

Mar 25 2021

vsellier closed T3175: Prepare production environment as Resolved.

node counters1.internal.softwareheritage.org deployed by terraform. The inventory section is created accordingly[1].
The journal_client is running.

Mar 25 2021, 5:36 PM · Roadmap 2021, System administration, Monitoring
vsellier closed T3175: Prepare production environment, a subtask of T2912: Next generation archive counters, as Resolved.
Mar 25 2021, 5:36 PM · Roadmap 2021, System administration, Monitoring, Web app
moranegg added a subtask for T3128: Improve deposit integration, management and display: T3176: Add query box on deposit origin on https://deposit.softwareheritage.org/.
Mar 25 2021, 4:21 PM · meta-task, Roadmap 2021, Monitoring, SWORD deposit, Web app
vsellier added a revision to T3175: Prepare production environment: D5338: counters: Declare production node.
Mar 25 2021, 3:33 PM · Roadmap 2021, System administration, Monitoring
vsellier changed the status of T3175: Prepare production environment from Open to Work in Progress.
Mar 25 2021, 2:52 PM · Roadmap 2021, System administration, Monitoring
vsellier changed the status of T3165: Generate historical data from the new counters series, a subtask of T2912: Next generation archive counters, from Open to Work in Progress.
Mar 25 2021, 2:38 PM · Roadmap 2021, System administration, Monitoring, Web app
vsellier changed the status of T3165: Generate historical data from the new counters series from Open to Work in Progress.
Mar 25 2021, 2:38 PM · System administration, Monitoring
vsellier closed T3164: Expose counters in prometheus format as Resolved.
Mar 25 2021, 2:38 PM · System administration, Monitoring
vsellier closed T3164: Expose counters in prometheus format, a subtask of T2912: Next generation archive counters, as Resolved.
Mar 25 2021, 2:38 PM · Roadmap 2021, System administration, Monitoring, Web app
moranegg triaged T3174: Filter deposit-admin view by deposit client on admin (moderation) page as Normal priority.
Mar 25 2021, 1:06 PM · Monitoring, SWORD deposit, Web app
moranegg triaged T3173: Provide access to deposit-clients to view dedicated moderation page as Normal priority.
Mar 25 2021, 1:04 PM · Monitoring, SWORD deposit, Web app
vsellier added a comment to T3164: Expose counters in prometheus format.

The counters are now exposed throught a /metrics enpoint and ingested by prometheus.
They are well tagged per environment so we will be able to isolate the counters for each one:

Mar 25 2021, 12:27 PM · System administration, Monitoring
vsellier added a revision to T3164: Expose counters in prometheus format: D5332: counters: count objects from more topics.
Mar 25 2021, 12:14 PM · System administration, Monitoring

Mar 24 2021

earthizflat added a revision to T2912: Next generation archive counters: D5326: Add a --version option to all the CLI commands.
Mar 24 2021, 7:27 PM · Roadmap 2021, System administration, Monitoring, Web app
vsellier added a revision to T3164: Expose counters in prometheus format: D5324: counters: add a prometheus job to read the new metrics end-point.
Mar 24 2021, 6:42 PM · System administration, Monitoring
vsellier added a revision to T3164: Expose counters in prometheus format: D5322: docker: Configure prometheus to retrieve swh-counters metrics.
Mar 24 2021, 5:45 PM · System administration, Monitoring
vsellier added a revision to T3164: Expose counters in prometheus format: D5321: Allow prometheus to retrieve the counter values.
Mar 24 2021, 5:32 PM · System administration, Monitoring
vsellier added a comment to T3164: Expose counters in prometheus format.

The current serie were the counters are stored is named sql_swh_archive_object_count, the serie for swh-counters could be swh_archive_object_count

Mar 24 2021, 10:28 AM · System administration, Monitoring
vsellier changed the status of T3164: Expose counters in prometheus format, a subtask of T2912: Next generation archive counters, from Open to Work in Progress.
Mar 24 2021, 10:08 AM · Roadmap 2021, System administration, Monitoring, Web app
vsellier changed the status of T3164: Expose counters in prometheus format from Open to Work in Progress.
Mar 24 2021, 10:08 AM · System administration, Monitoring

Mar 22 2021

vsellier removed a project from T3165: Generate historical data from the new counters series: Web app.
Mar 22 2021, 6:31 PM · System administration, Monitoring
vsellier triaged T3165: Generate historical data from the new counters series as Normal priority.
Mar 22 2021, 6:31 PM · System administration, Monitoring
vsellier triaged T3164: Expose counters in prometheus format as Normal priority.
Mar 22 2021, 5:50 PM · System administration, Monitoring
vsellier closed T3159: Deploy swh-counters:v0.1.0 in staging, a subtask of T2912: Next generation archive counters, as Resolved.
Mar 22 2021, 5:34 PM · Roadmap 2021, System administration, Monitoring, Web app
vsellier closed T3159: Deploy swh-counters:v0.1.0 in staging as Resolved.

A new vm counters0.internal.staging.swh.network is deployed and hosting redis, swh-counters and its journal-client.
The lag in staging will be recovered in a couple of hours.

Mar 22 2021, 5:34 PM · Staging environment, System administration, Monitoring
vsellier added a revision to T3159: Deploy swh-counters:v0.1.0 in staging: D5297: staging: Add counters0 vm.
Mar 22 2021, 3:40 PM · Staging environment, System administration, Monitoring
vsellier added a revision to T3159: Deploy swh-counters:v0.1.0 in staging: D5296: Add swh-counters deployment configuration.
Mar 22 2021, 8:32 AM · Staging environment, System administration, Monitoring

Mar 19 2021

vsellier moved T3159: Deploy swh-counters:v0.1.0 in staging from Backlog to in-progress on the System administration board.
Mar 19 2021, 12:39 PM · Staging environment, System administration, Monitoring
vsellier changed the status of T3159: Deploy swh-counters:v0.1.0 in staging from Open to Work in Progress.
Mar 19 2021, 12:39 PM · Staging environment, System administration, Monitoring

Mar 17 2021

vsellier closed T3147: Package swh-counters module as a debian package, a subtask of T2912: Next generation archive counters, as Resolved.
Mar 17 2021, 6:18 PM · Roadmap 2021, System administration, Monitoring, Web app
vsellier changed the status of T3147: Package swh-counters module as a debian package, a subtask of T2912: Next generation archive counters, from Open to Work in Progress.
Mar 17 2021, 4:24 PM · Roadmap 2021, System administration, Monitoring, Web app
vsellier closed T3146: Add pytest-redis package on the swh repository, a subtask of T2912: Next generation archive counters, as Resolved.
Mar 17 2021, 4:24 PM · Roadmap 2021, System administration, Monitoring, Web app

Mar 15 2021

rdicosmo moved T2912: Next generation archive counters from Backlog to Work in progress on the Roadmap 2021 board.
Mar 15 2021, 9:09 PM · Roadmap 2021, System administration, Monitoring, Web app
vsellier added a revision to T2912: Next generation archive counters: D5253: Implement remote service.
Mar 15 2021, 8:04 PM · Roadmap 2021, System administration, Monitoring, Web app
vlorentz triaged T3129: Reliable monitoring of services: for users and for admins as Normal priority.
Mar 15 2021, 12:29 PM · Roadmap 2022, Roadmap 2021, Monitoring, meta-task
vlorentz triaged T3128: Improve deposit integration, management and display as Normal priority.
Mar 15 2021, 12:29 PM · meta-task, Roadmap 2021, Monitoring, SWORD deposit, Web app

Mar 14 2021

rdicosmo created T3129: Reliable monitoring of services: for users and for admins .
Mar 14 2021, 8:17 PM · Roadmap 2022, Roadmap 2021, Monitoring, meta-task
rdicosmo created T3128: Improve deposit integration, management and display.
Mar 14 2021, 8:08 PM · meta-task, Roadmap 2021, Monitoring, SWORD deposit, Web app

Mar 12 2021

vsellier added a revision to T2912: Next generation archive counters: D5240: docker-compose: declare swh counters containers.
Mar 12 2021, 5:00 PM · Roadmap 2021, System administration, Monitoring, Web app
vsellier added a revision to T2912: Next generation archive counters: D5236: Add redis to the base build image.
Mar 12 2021, 12:49 PM · Roadmap 2021, System administration, Monitoring, Web app
vsellier added a revision to T2912: Next generation archive counters: D5232: Implement counters pipeline.
Mar 12 2021, 10:08 AM · Roadmap 2021, System administration, Monitoring, Web app

Mar 11 2021

rdicosmo added a project to T2912: Next generation archive counters: Roadmap 2021.
Mar 11 2021, 8:25 PM · Roadmap 2021, System administration, Monitoring, Web app

Mar 10 2021

vsellier added a revision to T2912: Next generation archive counters: D5229: swh-counters: Implement the cli skeleton.
Mar 10 2021, 4:30 PM · Roadmap 2021, System administration, Monitoring, Web app

Mar 5 2021

vsellier added a comment to T2912: Next generation archive counters.
Mar 5 2021, 12:12 PM · Roadmap 2021, System administration, Monitoring, Web app
vsellier changed the status of T2912: Next generation archive counters from Open to Work in Progress.

Let's start the subject ;)

Mar 5 2021, 11:07 AM · Roadmap 2021, System administration, Monitoring, Web app

Feb 18 2021

vsellier moved T2774: Fix vault end-to-end check from in-progress to deployed/landed/monitoring on the System administration board.
Feb 18 2021, 9:27 AM · Vault, System administration, Monitoring
vsellier closed T2774: Fix vault end-to-end check as Resolved.
Feb 18 2021, 9:27 AM · Vault, System administration, Monitoring
vsellier added a comment to T2774: Fix vault end-to-end check.

Thanks @anlambert, the monitoring comes back to green

Feb 18 2021, 9:27 AM · Vault, System administration, Monitoring

Feb 17 2021

anlambert added a revision to T2774: Fix vault end-to-end check: D5094: api_route: Ensure never_cache is honored for all response status codes.
Feb 17 2021, 1:55 PM · Vault, System administration, Monitoring
ardumont updated subscribers of T2774: Fix vault end-to-end check.

@anlambert agrees with the previous hypothesis ^ and is working on a fix

Feb 17 2021, 12:52 PM · Vault, System administration, Monitoring
vsellier added a comment to T2774: Fix vault end-to-end check.

With tcpdump, it seems swh-web don't add the headers to don't cache the response in case of a 404:

GET /api/1/vault/directory/a317baff051f68e83557d51e59539dac2ff55b34/ HTTP/1.1
Host: archive.softwareheritage.org
User-Agent: python-requests/2.21.0
Accept: */*
X-Forwarded-For: 128.93.166.14
X-Forwarded-Proto: https
Accept-Encoding: gzip
X-Varnish: 230399
Feb 17 2021, 12:46 PM · Vault, System administration, Monitoring
vsellier added a comment to T2774: Fix vault end-to-end check.

after digging, it seems the request with a 404 return code are cached by varnish.
When the test is launched, a first request is done which returns a 404 and the post is issued. When the check try to get the status of the cooking, the initial 404 is returned by varnish

Feb 17 2021, 12:40 PM · Vault, System administration, Monitoring
vsellier added a comment to T2774: Fix vault end-to-end check.

It seems the scheduler has missed some updates. After an upgrade of the python3-swh-.* packages, the error is again the initial one.

Feb 17 2021, 12:02 PM · Vault, System administration, Monitoring
vsellier changed the status of T2774: Fix vault end-to-end check from Open to Work in Progress.
Feb 17 2021, 11:51 AM · Vault, System administration, Monitoring
vsellier added a comment to T2774: Fix vault end-to-end check.

After an upgrade of the packages on pergamon and vangogh, the error is now :

Feb 17 10:49:38 vangogh python3[1990225]: 2021-02-17 10:49:38 [1990225] root:ERROR <RemoteException 500 InvalidDatetimeFormat: ['invalid input syntax for type timestamp with time zone: "Timestamp(seconds=1613558977, nanoseconds=999614000)"\nCONTEXT:  COPY tmp_task, line 1, column next_run: "Timestamp(seconds=1613558977, nanoseconds=999614000)"\n']>
                                          Traceback (most recent call last):
                                            File "/usr/lib/python3/dist-packages/swh/core/api/asynchronous.py", line 71, in middleware_handler
                                              return await handler(request)
                                            File "/usr/lib/python3/dist-packages/swh/core/api/asynchronous.py", line 178, in decorated_meth
                                              result = obj_meth(**kw)
                                            File "/usr/lib/python3/dist-packages/swh/core/db/common.py", line 62, in _meth
                                              return meth(self, *args, db=db, cur=cur, **kwargs)
                                            File "/usr/lib/python3/dist-packages/swh/vault/backend.py", line 220, in cook
                                              self.create_task(obj_type, obj_id, sticky)
                                            File "/usr/lib/python3/dist-packages/swh/core/db/common.py", line 62, in _meth
                                              return meth(self, *args, db=db, cur=cur, **kwargs)
                                            File "/usr/lib/python3/dist-packages/swh/vault/backend.py", line 163, in create_task
                                              task_id = self._send_task(obj_type, hex_id)
                                            File "/usr/lib/python3/dist-packages/swh/vault/backend.py", line 139, in _send_task
                                              added_tasks = self.scheduler.create_tasks([task])
                                            File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 181, in meth_
                                              return self.post(meth._endpoint_path, post_data)
                                            File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 278, in post
                                              return self._decode_response(response)
                                            File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 354, in _decode_response
                                              self.raise_for_status(response)
                                            File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 344, in raise_for_status
                                              raise exception from None
                                          swh.core.api.RemoteException: <RemoteException 500 InvalidDatetimeFormat: ['invalid input syntax for type timestamp with time zone: "Timestamp(seconds=1613558977, nanoseconds=999614000)"\nCONTEXT:  COPY tmp_task, line 1, column next_run: "Timestamp(seconds=1613558977, nanoseconds=999614000)"\n']>
Feb 17 2021, 11:51 AM · Vault, System administration, Monitoring
vsellier added projects to T2774: Fix vault end-to-end check: System administration, Vault.
Feb 17 2021, 10:46 AM · Vault, System administration, Monitoring

Feb 16 2021

vsellier added a comment to T2912: Next generation archive counters.

I wrote a proposal for the next steps [1] so we could start the work on these counters. All comments/contributions are welcome.

Feb 16 2021, 5:25 PM · Roadmap 2021, System administration, Monitoring, Web app

Feb 5 2021

douardda added a comment to T2912: Next generation archive counters.

@vsellier nice. Note that if we draw these with a y-axis starting from 0, the step shape will be really negligible, so IMHO it's really not a problem.

Feb 5 2021, 10:59 AM · Roadmap 2021, System administration, Monitoring, Web app
vsellier added a comment to T2912: Next generation archive counters.

Nice, thanks for confirming this at the source.

Feb 5 2021, 10:03 AM · Roadmap 2021, System administration, Monitoring, Web app