Page MenuHomeSoftware Heritage

Date overflow error in scheduler journal client for some visit type
Closed, MigratedEdits Locked

Description

Another instance of [1] with a more recent version of the scheduler.

After deploying the swh.scheduler v0.20, this error got raised by the monitoring and
sentry [2].

swhscheduler@saatchi:~$ /usr/bin/swh scheduler --config-file /etc/softwareheritage/scheduler/journal-client.yml journal-client
Traceback (most recent call last):
  File "/usr/bin/swh", line 11, in <module>
    load_entry_point('swh.core==0.15.0', 'console_scripts', 'swh')()
  File "/usr/lib/python3/dist-packages/swh/core/cli/__init__.py", line 185, in main
    return swh(auto_envvar_prefix="SWH")
  File "/usr/lib/python3/dist-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3/dist-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/lib/python3/dist-packages/swh/scheduler/cli/journal.py", line 52, in visit_stats_journal_client
    nb_messages = client.process(worker_fn)
  File "/usr/lib/python3/dist-packages/swh/journal/client.py", line 265, in process
    batch_processed, at_eof = self.handle_messages(messages, worker_fn)
  File "/usr/lib/python3/dist-packages/swh/journal/client.py", line 292, in handle_messages
    worker_fn(dict(objects))
  File "/usr/lib/python3/dist-packages/swh/scheduler/journal_client.py", line 274, in process_journal_objects
    queue_position_per_visit_type, visit_stats_d
  File "/usr/lib/python3/dist-packages/swh/scheduler/journal_client.py", line 101, in next_visit_queue_position
    return current_position + visit_interval
OverflowError: date value out of range

[1] T3502

[2] https://sentry.softwareheritage.org/share/issue/fa61b8333b0c4c429f8cf1356e3fc08d/

Event Timeline

ardumont created this task.
ardumont renamed this task from Date overflow error in scheduler journal client ("with a revenge") to Date overflow error in scheduler journal client for some visit type.Nov 22 2021, 5:40 PM

The stacktrace error is the same but in the end, it's the current position (/ default_queue_position)
of the queue (so per visit type) which are too far ahead in the future... [1]
Hence adding any offset creates the overflow problem.

17:34:34 softwareheritage-scheduler@belvedere:5432=> select visit_type, max(next_visit_queue_position) from origin_visit_stats group by visit_type;
+------------+-------------------------------+
| visit_type |              max              |
+------------+-------------------------------+
| cran       | 2022-01-22 12:53:06.616997+00 |
| deb        | 2738-11-16 23:45:23.144693+00 |
| deposit    | 2021-11-24 16:02:21.029009+00 |
| ftp        | (null)                        |
| git        | 3100-05-20 19:18:10.094927+00 |
| hg         | 9999-12-31 23:59:59.999999+00 |
| nixguix    | 2105-05-15 15:44:27.905956+00 |
| npm        | 2126-08-21 23:41:49.745801+00 |
| opam       | 2061-12-02 08:35:47.391929+00 |
| pypi       | 8010-03-16 14:33:59.963834+00 |
| svn        | 9999-12-31 23:59:59.999999+00 |
| tar        | 2021-11-22 16:01:07.43672+00  |
+------------+-------------------------------+
(12 rows)

Time: 250579.053 ms (04:10.579)
olasd added a subscriber: olasd.

I had hot-patched this to clamp dates to datetime.datetime.max, if an overflow was detected. Of course I never actually committed the change. D'oh.

diff --git a/swh/scheduler/journal_client.py b/swh/scheduler/journal_client.py
index c001a48..e2a18ad 100644
--- a/swh/scheduler/journal_client.py
+++ b/swh/scheduler/journal_client.py
@@ -4,7 +4,7 @@
 # See top-level LICENSE file for more information
 
 import copy
-from datetime import datetime, timedelta
+from datetime import datetime, timedelta, timezone
 import random
 from typing import Dict, List, Optional, Tuple
 
@@ -98,7 +98,10 @@ def next_visit_queue_position(
         if visit_stats.get("next_visit_queue_position")
         else default_queue_position
     )
-    return current_position + visit_interval
+    try:
+        return current_position + visit_interval
+    except OverflowError:
+        return datetime.max.replace(tzinfo=timezone.utc)
 
 
 def get_last_status(
olasd claimed this task.
olasd added a subscriber: vsellier.

I believe this has now been deployed in staging and production by @vsellier (thanks!)