Page MenuHomeSoftware Heritage

Add a successive_visits counter to OriginVisitStats
AbandonedPublic

Authored by ardumont on Jan 20 2021, 12:43 PM.

Details

Reviewers
douardda
Group Reviewers
Reviewers
Summary

which (tries to) maintain the number of successive visits resulting in
the same status.

For example, if the last 3 visits are successful evenful ones, this field
value will be 3. Then, an unueventful visit coming right after the latest
eventful one will reset this counter to one (last event is "uneventful" and
only one such event happened successively).

Depends on D4894

Diff Detail

Repository
rDSCH Scheduling utilities
Branch
successive-events
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 18522
Build 28651: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 28650: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D4895 (id=17386)

Could not rebase; Attempt merge onto b03d978241...

Updating b03d978..39a7488
Fast-forward
 swh/scheduler/journal_client.py            |  55 ++++++--
 swh/scheduler/model.py                     |   1 +
 swh/scheduler/sql/30-schema.sql            |   2 +
 swh/scheduler/tests/test_cli_journal.py    |   4 +-
 swh/scheduler/tests/test_journal_client.py | 197 ++++++++++++-----------------
 5 files changed, 130 insertions(+), 129 deletions(-)
Changes applied before test
commit 39a7488d400ac09a0ed34ce33b007056c1838ba2
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jan 20 12:37:13 2021 +0100

    Add a successive_visits counter to OriginVisitStats
    
    which (tries to) maintain the number of successive visits resulting in
    the same status.
    
    For example, if the last 3 visits are successful evenful ones, this field
    value will be 3. Then, an unueventful visit coming right after the latest
    eventful one will reset this counter to one (last event is "uneventful" and
    only one such event happened successively).

commit b95b3412f6565d2d38ba958ae3b72bec2bf6af28
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jan 20 12:32:40 2021 +0100

    Simplify journal client tests
    
    - sort visits by default (there is a test dedicated to dealing with unsorted
      messagaes from the journal),
    - remove "intermediate checks" in several tests: these do not help much
      but make the code more difficult to read and maintain,
    - rename VISIT_STATUSES1 as VISIT_STATUSES_1 to make less prone to
      being confused with VISIT_STATUSES (which also exists).

commit c386fdf3b9fc30946177e1ec0c05e0e814243f42
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jan 20 12:24:10 2021 +0100

    Make the max_date() helper function accept *dates as argument
    
    so it can be called with more than 2 dates.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/184/ for more details.

Build is green

Patch application report for D4895 (id=17438)

Could not rebase; Attempt merge onto c7b740cafa...

Updating c7b740c..36b9055
Fast-forward
 swh/scheduler/journal_client.py            |  32 ++++++
 swh/scheduler/model.py                     |   1 +
 swh/scheduler/sql/30-schema.sql            |   2 +
 swh/scheduler/tests/test_cli_journal.py    |   4 +-
 swh/scheduler/tests/test_journal_client.py | 174 +++++++++++------------------
 5 files changed, 100 insertions(+), 113 deletions(-)
Changes applied before test
commit 36b9055eeeee77118bafed65d95768bf02eb9c2a
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jan 20 12:37:13 2021 +0100

    Add a successive_visits counter to OriginVisitStats
    
    which (tries to) maintain the number of successive visits resulting in
    the same status.
    
    For example, if the last 3 visits are successful evenful ones, this field
    value will be 3. Then, an unueventful visit coming right after the latest
    eventful one will reset this counter to one (last event is "uneventful" and
    only one such event happened successively).

commit ffe2aed2fa329356e97bd9f8e160663c18bd0f06
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jan 20 12:32:40 2021 +0100

    Simplify journal client tests
    
    - sort visits by default (there is a test dedicated to dealing with unsorted
      messagaes from the journal),
    - remove "intermediate checks" in several tests: these do not help much
      but make the code more difficult to read and maintain,
    - rename VISIT_STATUSES1 as VISIT_STATUSES_1 to make less prone to
      being confused with VISIT_STATUSES (which also exists).

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/202/ for more details.

ardumont added a subscriber: ardumont.

lgtm

swh/scheduler/tests/test_journal_client.py
434

lol, what are the odds that for all the cases we designed in those tests and given their permutations, every output would give successive_visits to 1 each time... ¯\_(ツ)_/¯

This revision is now accepted and ready to land.Jan 21 2021, 8:50 AM
swh/scheduler/tests/test_journal_client.py
434

actually for this test (and the one before) it is not a surprise. We only have one event of each type (since 2 visits giving the same snapshot gives 1 eventful then 1 uneventful whichever order you get the messages). It would not have been the case with 3 visits (in a row) with the same snapshot, but this is not a tested case (with permutations).

heads up, I have need of this for T2345 for the branching part of the algo which needs to disable too many failed visits ;)

Superseded by D5978 (updated and rebased on D5956 for T2345)

ardumont edited reviewers, added: douardda; removed: ardumont.
This revision now requires review to proceed.Aug 13 2021, 9:20 AM

A variant of this got deployed along the v0.17 in the scheduler.