Page MenuHomeSoftware Heritage

journal_client: Ensure queue position does not overflow
ClosedPublic

Authored by ardumont on Aug 25 2021, 6:19 PM.

Details

Summary

Queue positions are date and the current next_position_offset used to compute the new
queue position was not bounded. This has the side-effect of making overflow error.

This commit adapts the journal client computations to limit such next_position_offset to
10 [1]. This value was chosen because above that exponent the dates overflow (and we are way
in the future already).

Currently in production, we already exceed such value (staging, prod: 14 [2]) which
makes the scheduler journal client fail.

[1]

In [10]: utcnow() + timedelta(days=4 ** 10)
Out[10]: datetime.datetime(4892, 7, 20, 16, 13, 7, 345452, tzinfo=datetime.timezone.utc)

In [11]: utcnow() + timedelta(days=4 ** 11)
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-11-42dc34eb0713> in <module>
----> 1 utcnow() + timedelta(days=4 ** 11)

OverflowError: date value out of range

[2]

18:19:22 swh-scheduler@db1:5432=> select max(next_position_offset) from origin_visit_stats where next_position_offset > 10;
+-----+
| max |
+-----+
|  14 |
+-----+
(1 row)

Time: 145.075 ms

18:19:40 softwareheritage-scheduler@belvedere:5432=> select max(next_position_offset) from origin_visit_stats where next_position_offset > 10;
+-----+
| max |
+-----+
|  14 |
+-----+
(1 row)

Time: 83509.421 ms (01:23.509)

Related to T3502

Test Plan

tox

staging scheduler got patched with the following and it restarted with the following (db migrated as well)

Diff Detail

Repository
rDSCH Scheduling utilities
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 23167
Build 36143: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 36142: arc lint + arc unit

Event Timeline

Build has FAILED

Patch application report for D6136 (id=22206)

Rebasing onto 28ae1d86aa...

Current branch diff-target is up to date.
Changes applied before test
commit 1d1306195aed1ca308b370364f34b3dd94a8047d
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Aug 25 18:15:06 2021 +0200

    journal_client: Bound the max position offset to 10
    
    Above that value, we hit date out of range issues.
    
    Related to T3502

Link to build: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/440/
See console output for more information: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/440/console

Harbormaster returned this revision to the author for changes because remote builds failed.Aug 25 2021, 6:22 PM
Harbormaster failed remote builds in B23164: Diff 22206!
ardumont edited the summary of this revision. (Show Details)
swh/scheduler/journal_client.py
54–55

more like...

swh/scheduler/journal_client.py
54–55

oh come on, i meant min...

Build is green

Patch application report for D6136 (id=22209)

Rebasing onto 28ae1d86aa...

Current branch diff-target is up to date.
Changes applied before test
commit fc3f00c589d3263e9d20f3bf81645e875e87908e
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Aug 25 18:15:06 2021 +0200

    journal_client: Bound the max position offset to 10
    
    Above that value, we hit date out of range issues.
    
    Related to T3502

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/441/ for more details.

swh/scheduler/journal_client.py
240–242

or ^ which might be more readable...

  • Clarify code change
  • Update docstring to mention the possible side when calculating offset.
  • Add a migration script to adapt existing values in db

Build is green

Patch application report for D6136 (id=22212)

Rebasing onto 28ae1d86aa...

Current branch diff-target is up to date.
Changes applied before test
commit dd8e3820938523e873fed2d3b058f2a09d73f5b5
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Aug 25 18:15:06 2021 +0200

    journal_client: Ensure queue position does not overflow
    
    Queue positions are date and the current next_position_offset used to compute the new
    queue position was not bounded. This has the side-effect of making overflow error.
    
    This commit adapts the journal client computations to limit such next_position_offset to
    10. This value was chosen because above that exponent the dates overflow (and we are way
    in the future already).
    
    Related to T3502

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/442/ for more details.

ardumont retitled this revision from journal_client: Bound the max position offset to 10 to journal_client: Ensure queue position does not overflow.Aug 26 2021, 9:57 AM
ardumont edited the summary of this revision. (Show Details)
ardumont edited the summary of this revision. (Show Details)

Fix docstring typo

This revision is now accepted and ready to land.Aug 26 2021, 10:23 AM

Fix another typo in the sql this time

Build is green

Patch application report for D6136 (id=22215)

Rebasing onto 28ae1d86aa...

Current branch diff-target is up to date.
Changes applied before test
commit d74dd5a6a992d71da2a1f00aad4a0382df704f6d
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Aug 25 18:15:06 2021 +0200

    journal_client: Ensure queue position does not overflow
    
    Queue positions are date and the current next_position_offset used to compute the new
    queue position was not bounded. This has the side-effect of making overflow error.
    
    This commit adapts the journal client computations to limit such next_position_offset to
    10. This value was chosen because above that exponent the dates overflow (and we are way
    in the future already).
    
    Related to T3502

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/443/ for more details.

Build is green

Patch application report for D6136 (id=22216)

Rebasing onto 28ae1d86aa...

Current branch diff-target is up to date.
Changes applied before test
commit 506f78c827e37ff841140d6f44ea66f96d6a8cb0
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Aug 25 18:15:06 2021 +0200

    journal_client: Ensure queue position does not overflow
    
    Queue positions are date and the current next_position_offset used to compute the new
    queue position was not bounded. This has the side-effect of making overflow error.
    
    This commit adapts the journal client computations to limit such next_position_offset to
    10. This value was chosen because above that exponent the dates overflow (and we are way
    in the future already).
    
    Related to T3502

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/444/ for more details.