Page MenuHomeSoftware Heritage

journal_client: Compute next position for origin visit
ClosedPublic

Authored by ardumont on Jun 30 2021, 10:03 AM.

Details

Summary

For origin without any last_update information [1], the journal client is now also in
charge of moving their next position in the queue for rescheduling. Depending on their
status, the next position offset and next_visit_queue_position are updated after each
visit completes:

  • if the visit has failed, increase the next visit target by the minimal visit interval (to take into account transient loading issues)
  • if the visit is successful, and records some changes, decrease the visit interval index by 2 (visit the origin *way* more often).
  • if the visit is successful, and records no changes, increase the visit interval index by 1 (visit the origin less often).

We then set the next visit target to its current value + the new visit interval
multiplied by a random fudge factor (picked in the -/+ 10% range).

The fudge factor allows the visits to spread out, avoiding "bursts" of loaded origins
e.g. when a number of origins from a single hoster are processed at once.

Note that the computations happen for all origins for simplicity and code maintenance
but it will only be used by a new soon-to-be scheduling policy.

[1] Lister cannot provide it for some reason.

Depends on D5919
Related to T2345

Test Plan

tox (failing as tests need update)

Diff Detail

Repository
rDSCH Scheduling utilities
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 22340
Build 34777: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 34776: arc lint + arc unit

Unit TestsFailed

TimeTest
43 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.scheduler.tests.test_journal_client::test_journal_client_origin_visit_status_duplicated_messages
swh_scheduler = <swh.scheduler.backend.SchedulerBackend object at 0x7f14fc084fd0> def test_journal_client_origin_visit_status_duplicated_messages(swh_scheduler):
35 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.scheduler.tests.test_journal_client::test_journal_client_origin_visit_status_from_journal_last_eventful
swh_scheduler = <swh.scheduler.backend.SchedulerBackend object at 0x7f14fddaff98> def test_journal_client_origin_visit_status_from_journal_last_eventful(swh_scheduler):
41 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.scheduler.tests.test_journal_client::test_journal_client_origin_visit_status_from_journal_last_failed
swh_scheduler = <swh.scheduler.backend.SchedulerBackend object at 0x7f14fde8c940> def test_journal_client_origin_visit_status_from_journal_last_failed(swh_scheduler):
35 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.scheduler.tests.test_journal_client::test_journal_client_origin_visit_status_from_journal_last_failed2
swh_scheduler = <swh.scheduler.backend.SchedulerBackend object at 0x7f14fdebf860> def test_journal_client_origin_visit_status_from_journal_last_failed2(swh_scheduler):
37 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.scheduler.tests.test_journal_client::test_journal_client_origin_visit_status_from_journal_last_notfound
swh_scheduler = <swh.scheduler.backend.SchedulerBackend object at 0x7f14fdea2320> def test_journal_client_origin_visit_status_from_journal_last_notfound(swh_scheduler):
View Full Test Results (37 Failed · 272 Passed · 1 Skipped)

Event Timeline

Build has FAILED

Patch application report for D5950 (id=21361)

Could not rebase; Attempt merge onto 1006f0aee4...

Updating 1006f0a..73c8c51
Fast-forward
 swh/scheduler/backend.py                   |  18 +++++
 swh/scheduler/interface.py                 |  19 +++++
 swh/scheduler/journal_client.py            |  94 ++++++++++++++++++++---
 swh/scheduler/model.py                     |   8 ++
 swh/scheduler/sql/30-schema.sql            |  19 +++++
 swh/scheduler/tests/test_api_client.py     |   2 +
 swh/scheduler/tests/test_journal_client.py | 115 ++++++++++++++++++++---------
 swh/scheduler/tests/test_scheduler.py      |  20 +++++
 8 files changed, 248 insertions(+), 47 deletions(-)
Changes applied before test
commit 73c8c511470bd8fa600b2e3c61e73268eee4eb73
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jun 29 16:00:01 2021 +0200

    journal_client: Compute next position for origin visit

commit 4fa0b3a165658f767b2163d0ac8e6cca3eb6b184
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 23 18:07:59 2021 +0200

    Introduce storage for the recurrent visit scheduler queue position

commit b8148ed33d596dc3cc6bac5f8d266a16e4776d82
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 23 16:42:26 2021 +0200

    Start handling of recurrent loading tasks in scheduler
    
    This deals first and foremost with the next_position_offset update done by the scheduler
    journal client.

commit c486b28ece7c0b127fea10bbb4d7f5d1ad5c50ba
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jun 29 14:41:07 2021 +0200

    journal_client: Explicit docstring

commit 98f99b9fd457820dc2d4b5dab7e89cb8261a34a4
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 23 16:39:40 2021 +0200

    journal_client: Only check last_* fields for some permutation tests
    
    In a future commit, we will add new fields whose values will be permutation dependent.

Link to build: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/394/
See console output for more information: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/394/console

  • Rebase
  • Adapt implementation to use the global queue position per visit type (for default fallback value)
  • Amend commit message to give more details

Actually amend the commit message i had forgotten to "commit".

Build was aborted

Patch application report for D5950 (id=21380)

Could not rebase; Attempt merge onto 1006f0aee4...

Updating 1006f0a..82aaef3
Fast-forward
 sql/updates/29.sql                         |  27 +++++++
 swh/scheduler/backend.py                   |  18 +++++
 swh/scheduler/interface.py                 |  19 +++++
 swh/scheduler/journal_client.py            |  98 +++++++++++++++++++++---
 swh/scheduler/model.py                     |   8 ++
 swh/scheduler/sql/30-schema.sql            |  21 +++++-
 swh/scheduler/tests/test_api_client.py     |   2 +
 swh/scheduler/tests/test_journal_client.py | 115 ++++++++++++++++++++---------
 swh/scheduler/tests/test_scheduler.py      |  20 +++++
 9 files changed, 280 insertions(+), 48 deletions(-)
 create mode 100644 sql/updates/29.sql
Changes applied before test
commit 82aaef31136452a697b89eaf08ec1bb48d6dcaa9
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jun 29 16:00:01 2021 +0200

    journal_client: Compute next position for origin visit
    
    The journal client is now in charge of moving the next position in the queue for
    rescheduling. It's computed for all visits to simplify the code but it's really used for
    origins without any last_update information.
    
    It's up to a new soon-to-be scheduling policy to schedule those origins and exploit
    those information.

commit cb1edf1ab24d1c8db5821578a7fb2633fab50ff4
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 23 18:07:59 2021 +0200

    Introduce storage for the recurrent visit scheduler queue position

commit ec6e69f6415a007611c46f25e7c48e909a793d53
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 23 16:42:26 2021 +0200

    Start handling of recurrent loading tasks in scheduler
    
    This deals first and foremost with the next_position_offset update done by the scheduler
    journal client.

commit c486b28ece7c0b127fea10bbb4d7f5d1ad5c50ba
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jun 29 14:41:07 2021 +0200

    journal_client: Explicit docstring

commit 98f99b9fd457820dc2d4b5dab7e89cb8261a34a4
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 23 16:39:40 2021 +0200

    journal_client: Only check last_* fields for some permutation tests
    
    In a future commit, we will add new fields whose values will be permutation dependent.

Link to build: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/398/
See console output for more information: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/398/console

Build has FAILED

Patch application report for D5950 (id=21381)

Could not rebase; Attempt merge onto 1006f0aee4...

Updating 1006f0a..f3d182b
Fast-forward
 sql/updates/29.sql                         |  27 +++++++
 swh/scheduler/backend.py                   |  18 +++++
 swh/scheduler/interface.py                 |  19 +++++
 swh/scheduler/journal_client.py            |  98 +++++++++++++++++++++---
 swh/scheduler/model.py                     |   8 ++
 swh/scheduler/sql/30-schema.sql            |  21 +++++-
 swh/scheduler/tests/test_api_client.py     |   2 +
 swh/scheduler/tests/test_journal_client.py | 115 ++++++++++++++++++++---------
 swh/scheduler/tests/test_scheduler.py      |  20 +++++
 9 files changed, 280 insertions(+), 48 deletions(-)
 create mode 100644 sql/updates/29.sql
Changes applied before test
commit f3d182b0d38ca6c617805bf2011d2001a0c8bb6c
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jun 29 16:00:01 2021 +0200

    journal_client: Compute next position for origin visit
    
    For origin without any last_update information [1], the journal client is now also in
    charge of moving their next position in the queue for rescheduling. Depending on their
    status, the next position offset and next_visit_queue_position are updated after each
    visit completes:
    
    - if the visit has failed, increase the next visit target by the minimal visit
      interval (to take into account transient loading issues)
    - if the visit is successful, and records some changes, decrease the visit interval
      index by 2 (visit the origin *way* more often).
    - if the visit is successful, and records no changes, increase the visit interval index
      by 1 (visit the origin less often).
    
    We then set the next visit target to its current value + the new visit interval
    multiplied by a random fudge factor (picked in the -/+ 10% range).
    
    The fudge factor allows the visits to spread out, avoiding "bursts" of loaded origins
    e.g. when a number of origins from a single hoster are processed at once.
    
    Note that the computations happen for all origins for simplicity and code maintenance
    but it will only be used by a new soon-to-be scheduling policy.
    
    [1] Lister cannot provide it for some reason.

commit cb1edf1ab24d1c8db5821578a7fb2633fab50ff4
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 23 18:07:59 2021 +0200

    Introduce storage for the recurrent visit scheduler queue position

commit ec6e69f6415a007611c46f25e7c48e909a793d53
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 23 16:42:26 2021 +0200

    Start handling of recurrent loading tasks in scheduler
    
    This deals first and foremost with the next_position_offset update done by the scheduler
    journal client.

commit c486b28ece7c0b127fea10bbb4d7f5d1ad5c50ba
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jun 29 14:41:07 2021 +0200

    journal_client: Explicit docstring

commit 98f99b9fd457820dc2d4b5dab7e89cb8261a34a4
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 23 16:39:40 2021 +0200

    journal_client: Only check last_* fields for some permutation tests
    
    In a future commit, we will add new fields whose values will be permutation dependent.

Link to build: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/399/
See console output for more information: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/399/console

Make tests happier.

Work is still needed to actually check the update of the next position logic

Build is green

Patch application report for D5950 (id=21402)

Could not rebase; Attempt merge onto 1006f0aee4...

Updating 1006f0a..faac6f8
Fast-forward
 sql/updates/29.sql                         |  27 +++
 swh/scheduler/backend.py                   |  18 ++
 swh/scheduler/interface.py                 |  19 ++
 swh/scheduler/journal_client.py            |  98 ++++++++-
 swh/scheduler/model.py                     |   8 +
 swh/scheduler/sql/30-schema.sql            |  21 +-
 swh/scheduler/tests/test_api_client.py     |   2 +
 swh/scheduler/tests/test_journal_client.py | 336 ++++++++++++++++++-----------
 swh/scheduler/tests/test_scheduler.py      |  20 ++
 9 files changed, 410 insertions(+), 139 deletions(-)
 create mode 100644 sql/updates/29.sql
Changes applied before test
commit faac6f895e0bd3565441c48c5ad3207ce141e8cb
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jun 29 16:00:01 2021 +0200

    journal_client: Compute next position for origin visit
    
    For origin without any last_update information [1], the journal client is now also in
    charge of moving their next position in the queue for rescheduling. Depending on their
    status, the next position offset and next_visit_queue_position are updated after each
    visit completes:
    
    - if the visit has failed, increase the next visit target by the minimal visit
      interval (to take into account transient loading issues)
    - if the visit is successful, and records some changes, decrease the visit interval
      index by 2 (visit the origin *way* more often).
    - if the visit is successful, and records no changes, increase the visit interval index
      by 1 (visit the origin less often).
    
    We then set the next visit target to its current value + the new visit interval
    multiplied by a random fudge factor (picked in the -/+ 10% range).
    
    The fudge factor allows the visits to spread out, avoiding "bursts" of loaded origins
    e.g. when a number of origins from a single hoster are processed at once.
    
    Note that the computations happen for all origins for simplicity and code maintenance
    but it will only be used by a new soon-to-be scheduling policy.
    
    [1] Lister cannot provide it for some reason.

commit cb1edf1ab24d1c8db5821578a7fb2633fab50ff4
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 23 18:07:59 2021 +0200

    Introduce storage for the recurrent visit scheduler queue position

commit ec6e69f6415a007611c46f25e7c48e909a793d53
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 23 16:42:26 2021 +0200

    Start handling of recurrent loading tasks in scheduler
    
    This deals first and foremost with the next_position_offset update done by the scheduler
    journal client.

commit c486b28ece7c0b127fea10bbb4d7f5d1ad5c50ba
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jun 29 14:41:07 2021 +0200

    journal_client: Explicit docstring

commit 98f99b9fd457820dc2d4b5dab7e89cb8261a34a4
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 23 16:39:40 2021 +0200

    journal_client: Only check last_* fields for some permutation tests
    
    In a future commit, we will add new fields whose values will be permutation dependent.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/401/ for more details.

swh/scheduler/journal_client.py
202

use random.uniform!

203

ensure days is ok with float.

swh/scheduler/journal_client.py
203

It was meant in regards to timedelta.

It is ok:

In [1]: from datetime import datetime, timedelta

In [2]: timedelta(days=0.1)
Out[2]: datetime.timedelta(seconds=8640)

In [3]: timedelta(days=1)
Out[3]: datetime.timedelta(days=1)

In [5]: timedelta(seconds=86400)
Out[5]: datetime.timedelta(days=1)
swh/scheduler/journal_client.py
202

great stuff [1]:

In [16]: while True: print(random.uniform(-0.1, 0.1)) ; sleep(0.1)
-0.07802568520265081
-0.012047862685772293
-0.09515824425588425
-0.09383355154670565
-0.07567022279491248
-0.028994578286562933
0.032746260794542265
0.09695343564361572
0.03365570420242092
-0.03822556466045879
-0.08169951622895931
-0.09698416654743738
-0.0011827698844187884
-0.036335877650609
0.05447047809839983
-0.05956846622376569
0.04799241406841087

[1] https://docs.python.org/3/library/random.html#random.uniform

Adapt according to the pair review:

  • extract and cover the new function which computes the next visit queue position
  • add more docstring in utility function

Build is green

Patch application report for D5950 (id=21508)

Could not rebase; Attempt merge onto 1006f0aee4...

Updating 1006f0a..41861bb
Fast-forward
 sql/updates/29.sql                         |  27 ++
 swh/scheduler/backend.py                   |  18 ++
 swh/scheduler/interface.py                 |  19 ++
 swh/scheduler/journal_client.py            | 125 ++++++++-
 swh/scheduler/model.py                     |   8 +
 swh/scheduler/sql/30-schema.sql            |  21 +-
 swh/scheduler/tests/test_api_client.py     |   2 +
 swh/scheduler/tests/test_journal_client.py | 412 ++++++++++++++++++++---------
 swh/scheduler/tests/test_scheduler.py      |  20 ++
 9 files changed, 513 insertions(+), 139 deletions(-)
 create mode 100644 sql/updates/29.sql
Changes applied before test
commit 41861bb15b87df40fd5a71606a51b5e56d0755e5
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jun 29 16:00:01 2021 +0200

    journal_client: Compute next position for origin visit
    
    For origin without any last_update information [1], the journal client is now also in
    charge of moving their next position in the queue for rescheduling. Depending on their
    status, the next position offset and next_visit_queue_position are updated after each
    visit completes:
    
    - if the visit has failed, increase the next visit target by the minimal visit
      interval (to take into account transient loading issues)
    - if the visit is successful, and records some changes, decrease the visit interval
      index by 2 (visit the origin *way* more often).
    - if the visit is successful, and records no changes, increase the visit interval index
      by 1 (visit the origin less often).
    
    We then set the next visit target to its current value + the new visit interval
    multiplied by a random fudge factor (picked in the -/+ 10% range).
    
    The fudge factor allows the visits to spread out, avoiding "bursts" of loaded origins
    e.g. when a number of origins from a single hoster are processed at once.
    
    Note that the computations happen for all origins for simplicity and code maintenance
    but it will only be used by a new soon-to-be scheduling policy.
    
    [1] Lister cannot provide it for some reason.

commit cb1edf1ab24d1c8db5821578a7fb2633fab50ff4
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 23 18:07:59 2021 +0200

    Introduce storage for the recurrent visit scheduler queue position

commit ec6e69f6415a007611c46f25e7c48e909a793d53
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 23 16:42:26 2021 +0200

    Start handling of recurrent loading tasks in scheduler
    
    This deals first and foremost with the next_position_offset update done by the scheduler
    journal client.

commit c486b28ece7c0b127fea10bbb4d7f5d1ad5c50ba
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jun 29 14:41:07 2021 +0200

    journal_client: Explicit docstring

commit 98f99b9fd457820dc2d4b5dab7e89cb8261a34a4
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 23 16:39:40 2021 +0200

    journal_client: Only check last_* fields for some permutation tests
    
    In a future commit, we will add new fields whose values will be permutation dependent.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/404/ for more details.

Build is green

Patch application report for D5950 (id=21509)

Could not rebase; Attempt merge onto 1006f0aee4...

Updating 1006f0a..a5e31ac
Fast-forward
 sql/updates/29.sql                         |  27 ++
 swh/scheduler/backend.py                   |  18 ++
 swh/scheduler/interface.py                 |  19 ++
 swh/scheduler/journal_client.py            | 136 +++++++++-
 swh/scheduler/model.py                     |   8 +
 swh/scheduler/sql/30-schema.sql            |  21 +-
 swh/scheduler/tests/test_api_client.py     |   2 +
 swh/scheduler/tests/test_journal_client.py | 412 ++++++++++++++++++++---------
 swh/scheduler/tests/test_scheduler.py      |  20 ++
 9 files changed, 524 insertions(+), 139 deletions(-)
 create mode 100644 sql/updates/29.sql
Changes applied before test
commit a5e31acf2a8e7a4152e34b4a2ea92c5f54498113
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jun 29 16:00:01 2021 +0200

    journal_client: Compute next position for origin visit
    
    For origin without any last_update information [1], the journal client is now also in
    charge of moving their next position in the queue for rescheduling. Depending on their
    status, the next position offset and next_visit_queue_position are updated after each
    visit completes:
    
    - if the visit has failed, increase the next visit target by the minimal visit
      interval (to take into account transient loading issues)
    - if the visit is successful, and records some changes, decrease the visit interval
      index by 2 (visit the origin *way* more often).
    - if the visit is successful, and records no changes, increase the visit interval index
      by 1 (visit the origin less often).
    
    We then set the next visit target to its current value + the new visit interval
    multiplied by a random fudge factor (picked in the -/+ 10% range).
    
    The fudge factor allows the visits to spread out, avoiding "bursts" of loaded origins
    e.g. when a number of origins from a single hoster are processed at once.
    
    Note that the computations happen for all origins for simplicity and code maintenance
    but it will only be used by a new soon-to-be scheduling policy.
    
    [1] Lister cannot provide it for some reason.

commit cb1edf1ab24d1c8db5821578a7fb2633fab50ff4
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 23 18:07:59 2021 +0200

    Introduce storage for the recurrent visit scheduler queue position

commit ec6e69f6415a007611c46f25e7c48e909a793d53
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 23 16:42:26 2021 +0200

    Start handling of recurrent loading tasks in scheduler
    
    This deals first and foremost with the next_position_offset update done by the scheduler
    journal client.

commit c486b28ece7c0b127fea10bbb4d7f5d1ad5c50ba
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jun 29 14:41:07 2021 +0200

    journal_client: Explicit docstring

commit 98f99b9fd457820dc2d4b5dab7e89cb8261a34a4
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 23 16:39:40 2021 +0200

    journal_client: Only check last_* fields for some permutation tests
    
    In a future commit, we will add new fields whose values will be permutation dependent.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/405/ for more details.

Use global import when not in cli

Build is green

Patch application report for D5950 (id=21510)

Could not rebase; Attempt merge onto 1006f0aee4...

Updating 1006f0a..8c4ae9f
Fast-forward
 sql/updates/29.sql                         |  27 ++
 swh/scheduler/backend.py                   |  18 ++
 swh/scheduler/interface.py                 |  19 ++
 swh/scheduler/journal_client.py            | 136 +++++++++-
 swh/scheduler/model.py                     |   8 +
 swh/scheduler/sql/30-schema.sql            |  21 +-
 swh/scheduler/tests/test_api_client.py     |   2 +
 swh/scheduler/tests/test_journal_client.py | 411 ++++++++++++++++++++---------
 swh/scheduler/tests/test_scheduler.py      |  20 ++
 9 files changed, 523 insertions(+), 139 deletions(-)
 create mode 100644 sql/updates/29.sql
Changes applied before test
commit 8c4ae9f14d6abdca41a4f01b438310501ecb6259
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jun 29 16:00:01 2021 +0200

    journal_client: Compute next position for origin visit
    
    For origin without any last_update information [1], the journal client is now also in
    charge of moving their next position in the queue for rescheduling. Depending on their
    status, the next position offset and next_visit_queue_position are updated after each
    visit completes:
    
    - if the visit has failed, increase the next visit target by the minimal visit
      interval (to take into account transient loading issues)
    - if the visit is successful, and records some changes, decrease the visit interval
      index by 2 (visit the origin *way* more often).
    - if the visit is successful, and records no changes, increase the visit interval index
      by 1 (visit the origin less often).
    
    We then set the next visit target to its current value + the new visit interval
    multiplied by a random fudge factor (picked in the -/+ 10% range).
    
    The fudge factor allows the visits to spread out, avoiding "bursts" of loaded origins
    e.g. when a number of origins from a single hoster are processed at once.
    
    Note that the computations happen for all origins for simplicity and code maintenance
    but it will only be used by a new soon-to-be scheduling policy.
    
    [1] Lister cannot provide it for some reason.

commit cb1edf1ab24d1c8db5821578a7fb2633fab50ff4
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 23 18:07:59 2021 +0200

    Introduce storage for the recurrent visit scheduler queue position

commit ec6e69f6415a007611c46f25e7c48e909a793d53
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 23 16:42:26 2021 +0200

    Start handling of recurrent loading tasks in scheduler
    
    This deals first and foremost with the next_position_offset update done by the scheduler
    journal client.

commit c486b28ece7c0b127fea10bbb4d7f5d1ad5c50ba
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jun 29 14:41:07 2021 +0200

    journal_client: Explicit docstring

commit 98f99b9fd457820dc2d4b5dab7e89cb8261a34a4
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 23 16:39:40 2021 +0200

    journal_client: Only check last_* fields for some permutation tests
    
    In a future commit, we will add new fields whose values will be permutation dependent.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/406/ for more details.

ardumont published this revision for review.Jul 6 2021, 3:58 PM
olasd added inline comments.
swh/scheduler/tests/test_journal_client.py
129

(We'll add typing to this in a further diff)

This revision is now accepted and ready to land.Jul 22 2021, 11:45 AM