Page MenuHomeSoftware Heritage

scheduler's cron cleanup error when filtering tasks to archive
Started, Work in Progress, NormalPublic

Description

From checking saatchi's node as swhscheduler user (through mail or mutt):

Date: Wed, 24 Jul 2019 08:52:05 +0000 (UTC)
From: Cron Daemon <root@saatchi.internal.softwareheritage.org>
To: swhscheduler@saatchi.internal.softwareheritage.org
Subject: Cron <swhscheduler@saatchi> /usr/bin/swh scheduler --config-file /etc/softwareheritage/backend/elastic.yml task archive

INFO:swh.core.config:Loading config file /etc/softwareheritage/global.ini
INFO:swh.core.config:Loading config file /etc/softwareheritage/worker.ini
INFO:swh.core.config:Loading config file /etc/softwareheritage/backend/elastic.yml
[INFO] swh.core.config -- Loading config file /etc/softwareheritage/backend/elastic.yml
Traceback (most recent call last):
  File "/usr/bin/swh", line 11, in <module>
    load_entry_point('swh.core==0.0.63', 'console_scripts', 'swh')()
  File "/usr/lib/python3/dist-packages/swh/core/cli/__init__.py", line 56, in main
    return swh(auto_envvar_prefix='SWH')
  File "/usr/lib/python3/dist-packages/click/core.py", line 716, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/core.py", line 696, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3/dist-packages/click/core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/click/core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/click/core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/click/core.py", line 889, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3/dist-packages/click/core.py", line 534, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/lib/python3/dist-packages/swh/scheduler/cli/task.py", line 553, in archive_tasks
    for task_ids in grouper(gen, n=batch_clean):
  File "/usr/lib/python3/dist-packages/swh/core/utils.py", line 48, in grouper
    for _data in itertools.zip_longest(*args, fillvalue=stop_value):
  File "/usr/lib/python3/dist-packages/swh/scheduler/cli/task.py", line 538, in index_data
    after, before, last_id=last_id, limit=batch_index)
  File "/usr/lib/python3/dist-packages/swh/scheduler/api/client.py", line 102, in filter_task_to_archive
    'last_id': last_id,
  File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 198, in post
    return self._decode_response(response)
  File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 235, in _decode_response
    response.content,
swh.core.api.RemoteException: Unexpected status code for API request: 504 (b'<html>\r\n<head><title>504 Gateway Time-out</title></head>\r\n<body bgcolor="white">\r\n<center><h1>504 Gateway
+Time-out</h1></center>\r\n<hr><center>nginx/1.10.3</center>\r\n</body>\r\n</html>\r\n')

Event Timeline

ardumont triaged this task as Normal priority.
ardumont updated the task description. (Show Details)
ftigeot changed the task status from Open to Work in Progress.Wed, Jul 24, 4:36 PM
ftigeot added a subscriber: ftigeot.

Fwiw, a manual connection to esnode1:9200 doesn't show this error

Fwiw, a manual connection to esnode1:9200 doesn't show this error

Indeed (forgot to mention it here).

And the configuration mentioned in the log:

/etc/softwareheritage/backend/elastic.yml

---
scheduler:
  cls: remote
  args:
    url: http://saatchi.internal.softwareheritage.org:5008/
storage_nodes:
- host: esnode2.internal.softwareheritage.org
  port: 9200
- host: esnode3.internal.softwareheritage.org
  port: 9200
- host: esnode1.internal.softwareheritage.org
  port: 9200
client_options:
  sniff_on_start: false
  sniff_on_connection_fail: true
  http_compress: false

Oh, now i'm wondering if it's not the scheduler api which timeouts... ;)

Oh, now i'm wondering if it's not the scheduler api which timeouts... ;)

File "/usr/lib/python3/dist-packages/swh/scheduler/api/client.py", line 102, in filter_task_to_archive
    'last_id': last_id,

agrees with me!
Ok, now that makes more sense, i'll rename the task then.

I don't know what made me think the connection error was with elasticsearch...¯\_(ツ)_/¯

ardumont renamed this task from scheduler's cron cleanup no longer able to connect to elastic node to scheduler's cron cleanup error when filtering tasks to archive.Fri, Jul 26, 9:53 AM