Page MenuHomeSoftware Heritage

journal_client: Disable origins when too many visited attempts failed
ClosedPublic

Authored by ardumont on Jul 8 2021, 11:26 AM.

Details

Summary

This disable origins for either failed or not found attempts. It's the lister's
responsibility to activate back origins if they get listed again

This ends up the new scheduling policy started in D5950.

Depends on D5978
Related to T2345

Test Plan

tox

Event Timeline

ardumont edited the summary of this revision. (Show Details)

Build is green

Patch application report for D5980 (id=21553)

Could not rebase; Attempt merge onto 1006f0aee4...

Updating 1006f0a..b16fd82
Fast-forward
 sql/updates/29.sql                         |  31 ++
 swh/scheduler/backend.py                   |  41 ++-
 swh/scheduler/interface.py                 |  19 +
 swh/scheduler/journal_client.py            | 198 ++++++++++-
 swh/scheduler/model.py                     |  10 +
 swh/scheduler/sql/30-schema.sql            |  23 +-
 swh/scheduler/tests/test_api_client.py     |   2 +
 swh/scheduler/tests/test_journal_client.py | 548 ++++++++++++++++++++++-------
 swh/scheduler/tests/test_scheduler.py      | 173 ++++++++-
 9 files changed, 894 insertions(+), 151 deletions(-)
 create mode 100644 sql/updates/29.sql
Changes applied before test
commit b16fd8252e00ac81214f4ac218d55e7eaf9beb69
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Jul 8 11:24:42 2021 +0200

    journal_client: Deactivate origins when too many visited attempts failed
    
    Related to T2345

commit 64aa4458ddd2bb5bd9a913c17950732d962129e6
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jul 7 16:55:57 2021 +0200

    Add a successive_visits counter to origin visit stats
    
    This maintains the number of successive visits resulting in the same status. This will
    help when implementing the disabling of too many failed visit attempts for a given
    origin.
    
    Related to T2345

commit b02db7ce6222feeb5db7a7aff83a11c3a3697bd3
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Jul 1 12:18:49 2021 +0200

    Introduce new scheduling policy to grab origins without last update
    
    This is in charge of scheduling origins without last update. This also updates the
    global queue position so the journal client can initialize correctly the next position
    per origin and visit type.
    
    Related to T2345

commit 8c4ae9f14d6abdca41a4f01b438310501ecb6259
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jun 29 16:00:01 2021 +0200

    journal_client: Compute next position for origin visit
    
    For origin without any last_update information [1], the journal client is now also in
    charge of moving their next position in the queue for rescheduling. Depending on their
    status, the next position offset and next_visit_queue_position are updated after each
    visit completes:
    
    - if the visit has failed, increase the next visit target by the minimal visit
      interval (to take into account transient loading issues)
    - if the visit is successful, and records some changes, decrease the visit interval
      index by 2 (visit the origin *way* more often).
    - if the visit is successful, and records no changes, increase the visit interval index
      by 1 (visit the origin less often).
    
    We then set the next visit target to its current value + the new visit interval
    multiplied by a random fudge factor (picked in the -/+ 10% range).
    
    The fudge factor allows the visits to spread out, avoiding "bursts" of loaded origins
    e.g. when a number of origins from a single hoster are processed at once.
    
    Note that the computations happen for all origins for simplicity and code maintenance
    but it will only be used by a new soon-to-be scheduling policy.
    
    [1] Lister cannot provide it for some reason.

commit cb1edf1ab24d1c8db5821578a7fb2633fab50ff4
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 23 18:07:59 2021 +0200

    Introduce storage for the recurrent visit scheduler queue position

commit ec6e69f6415a007611c46f25e7c48e909a793d53
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 23 16:42:26 2021 +0200

    Start handling of recurrent loading tasks in scheduler
    
    This deals first and foremost with the next_position_offset update done by the scheduler
    journal client.

commit c486b28ece7c0b127fea10bbb4d7f5d1ad5c50ba
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jun 29 14:41:07 2021 +0200

    journal_client: Explicit docstring

commit 98f99b9fd457820dc2d4b5dab7e89cb8261a34a4
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 23 16:39:40 2021 +0200

    journal_client: Only check last_* fields for some permutation tests
    
    In a future commit, we will add new fields whose values will be permutation dependent.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/415/ for more details.

Build is green

Patch application report for D5980 (id=21882)

Could not rebase; Attempt merge onto 4fa29fe128...

Updating 4fa29fe..88d3036
Fast-forward
 sql/updates/29.sql                         |   4 ++
 swh/scheduler/journal_client.py            |  32 +++++++++
 swh/scheduler/model.py                     |   2 +
 swh/scheduler/sql/30-schema.sql            |   2 +
 swh/scheduler/tests/test_journal_client.py | 105 ++++++++++++++++++++++++++++-
 5 files changed, 142 insertions(+), 3 deletions(-)
Changes applied before test
commit 88d3036f407698c2615f18e9b470cf08ecb1716c
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Jul 8 11:24:42 2021 +0200

    journal_client: Deactivate origins when too many visited attempts failed
    
    Either for failed or not found attempts.
    
    Related to T2345

commit cdc2af4733752ddccea01bdbd70b9805fbdaf6f1
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jul 7 16:55:57 2021 +0200

    Add a successive_visits counter to origin visit stats
    
    This maintains the number of successive visits resulting in the same status. This will
    help implementing disabling of too many successive failed or not_found visits for a
    given origin.
    
    Related to T2345

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/421/ for more details.

ardumont edited the summary of this revision. (Show Details)

Disable failing visits when reaching 2 (since the counter starts at 0)

Build is green

Patch application report for D5980 (id=21886)

Rebasing onto 4fa29fe128...

Current branch diff-target is up to date.
Changes applied before test
commit d616934db615e6f53ea89c629a6b660bd24176e4
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jul 7 16:55:57 2021 +0200

    Add a successive_visits counter to origin visit stats
    
    This maintains the number of successive visits resulting in the same status. This will
    help implementing disabling of too many successive failed or not_found visits for a
    given origin.
    
    Related to T2345

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/423/ for more details.

Build is green

Patch application report for D5980 (id=21887)

Could not rebase; Attempt merge onto 4fa29fe128...

Updating 4fa29fe..dbb1e40
Fast-forward
 sql/updates/29.sql                         |   4 ++
 swh/scheduler/journal_client.py            |  32 +++++++++
 swh/scheduler/model.py                     |   2 +
 swh/scheduler/sql/30-schema.sql            |   2 +
 swh/scheduler/tests/test_journal_client.py | 105 ++++++++++++++++++++++++++++-
 5 files changed, 142 insertions(+), 3 deletions(-)
Changes applied before test
commit dbb1e40ac43783cf8ded478f1925206e44b53fef
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Jul 8 11:24:42 2021 +0200

    journal_client: Deactivate origins when too many visited attempts failed
    
    Either for failed or not found attempts.
    
    Related to T2345

commit d616934db615e6f53ea89c629a6b660bd24176e4
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jul 7 16:55:57 2021 +0200

    Add a successive_visits counter to origin visit stats
    
    This maintains the number of successive visits resulting in the same status. This will
    help implementing disabling of too many successive failed or not_found visits for a
    given origin.
    
    Related to T2345

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/424/ for more details.

Build is green

Patch application report for D5980 (id=21890)

Could not rebase; Attempt merge onto 4fa29fe128...

Updating 4fa29fe..196ba39
Fast-forward
 sql/updates/29.sql                         |   4 ++
 swh/scheduler/journal_client.py            |  32 +++++++++
 swh/scheduler/model.py                     |   2 +
 swh/scheduler/sql/30-schema.sql            |   2 +
 swh/scheduler/tests/test_journal_client.py | 107 ++++++++++++++++++++++++++++-
 5 files changed, 144 insertions(+), 3 deletions(-)
Changes applied before test
commit 196ba394712751b52c604b2f2444fe5a5d214e44
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Jul 8 11:24:42 2021 +0200

    journal_client: Deactivate origins when too many visited attempts failed
    
    Either for failed or not found attempts.
    
    Related to T2345

commit 015d16158df9a87cdea29d76a55381d6798ee4e3
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jul 7 16:55:57 2021 +0200

    Add a successive_visits counter to origin visit stats
    
    This maintains the number of successive visits resulting in the same status. This will
    help implementing disabling of too many successive failed or not_found visits for a
    given origin.
    
    Related to T2345

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/427/ for more details.

I'm not a huge fan of disabling origins forever; are you planning to relax this somehow? (eg. visit again a couple of years later)

What happens if a lister lists a disabled origin again?

swh/scheduler/journal_client.py
258
262

hardcoded constant!

I'm not a huge fan of disabling origins forever; are you planning to relax this somehow? (eg. visit again a couple of years later)

It's not forever.

What happens if a lister lists a disabled origin again?

The lister will activate the origin again.

This revision is now accepted and ready to land.Aug 3 2021, 12:37 PM
  • Rebase
  • Adapt according to review (avoid constant)
  • Update docstring to explicit the disabling of origins
ardumont retitled this revision from journal_client: Deactivate origins when too many visited attempts failed to journal_client: Disable origins when too many visited attempts failed.Aug 3 2021, 12:54 PM
ardumont edited the summary of this revision. (Show Details)
ardumont edited the summary of this revision. (Show Details)

Rework commit message

Build is green

Patch application report for D5980 (id=21895)

Could not rebase; Attempt merge onto 4fa29fe128...

Updating 4fa29fe..8d1b51f
Fast-forward
 sql/updates/29.sql                         |   4 ++
 swh/scheduler/journal_client.py            |  35 ++++++++++
 swh/scheduler/model.py                     |   2 +
 swh/scheduler/sql/30-schema.sql            |   2 +
 swh/scheduler/tests/test_journal_client.py | 107 ++++++++++++++++++++++++++++-
 5 files changed, 147 insertions(+), 3 deletions(-)
Changes applied before test
commit 8d1b51f0a60cf1f8b94942a490c00f7b0b4097c7
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Jul 8 11:24:42 2021 +0200

    journal_client: Deactivate origins when too many visited attempts failed
    
    Either for failed or not found attempts. It's up to the lister to activate back the
    origins if they are getting alive at some point.
    
    Related to T2345

commit 1bcf84d5e66d02c006698a89d2571911d3fd0764
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jul 7 16:55:57 2021 +0200

    Add a successive_visits counter to origin visit stats
    
    This maintains the number of successive visits resulting in the same status. This will
    help implementing disabling of too many successive failed or not_found visits for a
    given origin.
    
    Related to T2345

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/429/ for more details.

Actually rework commit message (without forgetting to commit the message)

Build is green

Patch application report for D5980 (id=21896)

Could not rebase; Attempt merge onto 4fa29fe128...

Updating 4fa29fe..8d1b51f
Fast-forward
 sql/updates/29.sql                         |   4 ++
 swh/scheduler/journal_client.py            |  35 ++++++++++
 swh/scheduler/model.py                     |   2 +
 swh/scheduler/sql/30-schema.sql            |   2 +
 swh/scheduler/tests/test_journal_client.py | 107 ++++++++++++++++++++++++++++-
 5 files changed, 147 insertions(+), 3 deletions(-)
Changes applied before test
commit 8d1b51f0a60cf1f8b94942a490c00f7b0b4097c7
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Jul 8 11:24:42 2021 +0200

    journal_client: Deactivate origins when too many visited attempts failed
    
    Either for failed or not found attempts. It's up to the lister to activate back the
    origins if they are getting alive at some point.
    
    Related to T2345

commit 1bcf84d5e66d02c006698a89d2571911d3fd0764
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jul 7 16:55:57 2021 +0200

    Add a successive_visits counter to origin visit stats
    
    This maintains the number of successive visits resulting in the same status. This will
    help implementing disabling of too many successive failed or not_found visits for a
    given origin.
    
    Related to T2345

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/430/ for more details.

Build is green

Patch application report for D5980 (id=21897)

Could not rebase; Attempt merge onto 4fa29fe128...

Updating 4fa29fe..d92e052
Fast-forward
 sql/updates/29.sql                         |   4 ++
 swh/scheduler/journal_client.py            |  35 ++++++++++
 swh/scheduler/model.py                     |   2 +
 swh/scheduler/sql/30-schema.sql            |   2 +
 swh/scheduler/tests/test_journal_client.py | 107 ++++++++++++++++++++++++++++-
 5 files changed, 147 insertions(+), 3 deletions(-)
Changes applied before test
commit d92e05218f9458f11b99fcbc82ae518185c125c1
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Jul 8 11:24:42 2021 +0200

    journal_client: Disable origins when too many visited attempts failed
    
    This disable origins for either failed or not found attempts 3 times in a row. It's not
    definitive though as it's the lister's responsibility to activate back origins if they
    get listed again.
    
    Related to T2345

commit 1bcf84d5e66d02c006698a89d2571911d3fd0764
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jul 7 16:55:57 2021 +0200

    Add a successive_visits counter to origin visit stats
    
    This maintains the number of successive visits resulting in the same status. This will
    help implementing disabling of too many successive failed or not_found visits for a
    given origin.
    
    Related to T2345

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/431/ for more details.

swh/scheduler/journal_client.py
17

docstring please

133–159

Build is green

Patch application report for D5980 (id=21898)

Could not rebase; Attempt merge onto 4fa29fe128...

Updating 4fa29fe..8281e35
Fast-forward
 sql/updates/29.sql                         |   4 ++
 swh/scheduler/journal_client.py            |  38 +++++++++-
 swh/scheduler/model.py                     |   2 +
 swh/scheduler/sql/30-schema.sql            |   2 +
 swh/scheduler/tests/test_journal_client.py | 107 ++++++++++++++++++++++++++++-
 5 files changed, 149 insertions(+), 4 deletions(-)
Changes applied before test
commit 8281e351d6a13a55711fca5b89c7f24c71174dab
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Jul 8 11:24:42 2021 +0200

    journal_client: Disable origins when too many visited attempts failed
    
    This disable origins for either failed or not found attempts 3 times in a row. It's not
    definitive though as it's the lister's responsibility to activate back origins if they
    get listed again.
    
    Related to T2345

commit 1bcf84d5e66d02c006698a89d2571911d3fd0764
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jul 7 16:55:57 2021 +0200

    Add a successive_visits counter to origin visit stats
    
    This maintains the number of successive visits resulting in the same status. This will
    help implementing disabling of too many successive failed or not_found visits for a
    given origin.
    
    Related to T2345

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/432/ for more details.