Everything is released correctly and deployed on staging
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Apr 9 2021
I finally found why the graphs looks weird : https://forge.softwareheritage.org/source/swh-web/browse/master/swh/web/misc/urls.py$31
With a dirty patch on the server, it's way better:
\o/
The pipeline is deployed in staging.
It's working but it seems the graphs need some initial values in staging to make the rendering correctly:
Apr 8 2021
Apr 7 2021
Reopening as the release is not working on the stable branch
Apr 6 2021
Apr 5 2021
Apr 1 2021
In T3190#61917, @ardumont wrote:An improvment of the journal client is necessary to add the support of this configuration like for the producer:
Do you need such improvment though? According to the code you linked, you could pass a
producer_config dict with that key and value.
@ardumont the linked code is for producer, the bug is in consumers
An improvment of the journal client is necessary to add the support of this configuration like for the producer:
It seems the problem is not present anymore with a higher max message size ('500 * 1024 * 1024').
What version of zstd is installed on the system? that bug was introduced in v1.4.5 (https://github.com/facebook/zstd/commit/718f00ff6fe42db7e6ba09a7f7992b3e85283f77), and fixed in v1.4.7 (https://github.com/facebook/zstd/pull/2272/commits/1302f8d67691356b2f0aec8c62c1a7af2886a7cc)
This looks like https://github.com/edenhill/librdkafka/issues/2672 , which is caused by a bug in zstd: https://github.com/facebook/zstd/issues/2222
for the record, increasing the property message.max.bytes to 100 * 1024 * 1024 in the consumer configuration is not solving the problem
The same problem occured during the poc, theses messages were ignored by using this consumer configuration "errors.tolerance": 'all' [1].
I will try to find if there is a more elegant way to deal with this issue ;)
Mar 31 2021
Mar 26 2021
Great idea.
An improvment idea came to me during the refactoring, the script can be splitted and integrated in the 'swh-counters' codebase.
Mar 25 2021
node counters1.internal.softwareheritage.org deployed by terraform. The inventory section is created accordingly[1].
The journal_client is running.
The counters are now exposed throught a /metrics enpoint and ingested by prometheus.
They are well tagged per environment so we will be able to isolate the counters for each one:
Mar 24 2021
The current serie were the counters are stored is named sql_swh_archive_object_count, the serie for swh-counters could be swh_archive_object_count
Mar 22 2021
A new vm counters0.internal.staging.swh.network is deployed and hosting redis, swh-counters and its journal-client.
The lag in staging will be recovered in a couple of hours.
Mar 19 2021
Mar 17 2021
Mar 15 2021
Mar 14 2021
Mar 12 2021
Mar 11 2021
Mar 10 2021
Mar 5 2021
- Repository created : https://forge.softwareheritage.org/source/swh-counters/
- Jenkins jobs configured : https://jenkins.softwareheritage.org/job/DCNT/
Let's start the subject ;)
Feb 18 2021
Thanks @anlambert, the monitoring comes back to green
Feb 17 2021
@anlambert agrees with the previous hypothesis ^ and is working on a fix
With tcpdump, it seems swh-web don't add the headers to don't cache the response in case of a 404:
GET /api/1/vault/directory/a317baff051f68e83557d51e59539dac2ff55b34/ HTTP/1.1 Host: archive.softwareheritage.org User-Agent: python-requests/2.21.0 Accept: */* X-Forwarded-For: 128.93.166.14 X-Forwarded-Proto: https Accept-Encoding: gzip X-Varnish: 230399
after digging, it seems the request with a 404 return code are cached by varnish.
When the test is launched, a first request is done which returns a 404 and the post is issued. When the check try to get the status of the cooking, the initial 404 is returned by varnish
It seems the scheduler has missed some updates. After an upgrade of the python3-swh-.* packages, the error is again the initial one.
After an upgrade of the packages on pergamon and vangogh, the error is now :
Feb 17 10:49:38 vangogh python3[1990225]: 2021-02-17 10:49:38 [1990225] root:ERROR <RemoteException 500 InvalidDatetimeFormat: ['invalid input syntax for type timestamp with time zone: "Timestamp(seconds=1613558977, nanoseconds=999614000)"\nCONTEXT: COPY tmp_task, line 1, column next_run: "Timestamp(seconds=1613558977, nanoseconds=999614000)"\n']> Traceback (most recent call last): File "/usr/lib/python3/dist-packages/swh/core/api/asynchronous.py", line 71, in middleware_handler return await handler(request) File "/usr/lib/python3/dist-packages/swh/core/api/asynchronous.py", line 178, in decorated_meth result = obj_meth(**kw) File "/usr/lib/python3/dist-packages/swh/core/db/common.py", line 62, in _meth return meth(self, *args, db=db, cur=cur, **kwargs) File "/usr/lib/python3/dist-packages/swh/vault/backend.py", line 220, in cook self.create_task(obj_type, obj_id, sticky) File "/usr/lib/python3/dist-packages/swh/core/db/common.py", line 62, in _meth return meth(self, *args, db=db, cur=cur, **kwargs) File "/usr/lib/python3/dist-packages/swh/vault/backend.py", line 163, in create_task task_id = self._send_task(obj_type, hex_id) File "/usr/lib/python3/dist-packages/swh/vault/backend.py", line 139, in _send_task added_tasks = self.scheduler.create_tasks([task]) File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 181, in meth_ return self.post(meth._endpoint_path, post_data) File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 278, in post return self._decode_response(response) File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 354, in _decode_response self.raise_for_status(response) File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 344, in raise_for_status raise exception from None swh.core.api.RemoteException: <RemoteException 500 InvalidDatetimeFormat: ['invalid input syntax for type timestamp with time zone: "Timestamp(seconds=1613558977, nanoseconds=999614000)"\nCONTEXT: COPY tmp_task, line 1, column next_run: "Timestamp(seconds=1613558977, nanoseconds=999614000)"\n']>
Feb 16 2021
I wrote a proposal for the next steps [1] so we could start the work on these counters. All comments/contributions are welcome.
Feb 5 2021
@vsellier nice. Note that if we draw these with a y-axis starting from 0, the step shape will be really negligible, so IMHO it's really not a problem.
Nice, thanks for confirming this at the source.