Details
- Reviewers
olasd douardda vlorentz - Group Reviewers
Reviewers - Maniphest Tasks
- T2345: Improve handling of recurrent loading tasks in scheduler
- Commits
- rDSCH1bcf84d5e66d: Add a successive_visits counter to origin visit stats
tox
Diff Detail
- Repository
- rDSCH Scheduling utilities
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 22859 Build 35651: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 35650: arc lint + arc unit
Event Timeline
Build is green
Patch application report for D5978 (id=21541)
Could not rebase; Attempt merge onto 1006f0aee4...
Updating 1006f0a..7d71637 Fast-forward sql/updates/29.sql | 31 +++ swh/scheduler/backend.py | 41 ++- swh/scheduler/interface.py | 19 ++ swh/scheduler/journal_client.py | 168 ++++++++++- swh/scheduler/model.py | 10 + swh/scheduler/sql/30-schema.sql | 23 +- swh/scheduler/tests/test_api_client.py | 2 + swh/scheduler/tests/test_journal_client.py | 429 ++++++++++++++++++++--------- swh/scheduler/tests/test_scheduler.py | 173 +++++++++++- 9 files changed, 744 insertions(+), 152 deletions(-) create mode 100644 sql/updates/29.sql
Changes applied before test
commit 7d71637db7c6521b4cacbafd4e63e2bbd6432bd3 Author: David Douard <david.douard@sdfa3.org> Date: Wed Jul 7 16:55:57 2021 +0200 Add a successive_visits counter to OriginVisitStats which (tries to) maintain the number of successive visits resulting in the same status. For example, if the last 3 visits are successful evenful ones, this field value will be 3. Then, an uneventful visit coming right after the latest eventful one will reset this counter to one (last event is "uneventful" and only one such event happened successively). This will help when we will implement the disabling of too many failed visit attempt on a given origin Related to T2345 commit b02db7ce6222feeb5db7a7aff83a11c3a3697bd3 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Thu Jul 1 12:18:49 2021 +0200 Introduce new scheduling policy to grab origins without last update This is in charge of scheduling origins without last update. This also updates the global queue position so the journal client can initialize correctly the next position per origin and visit type. Related to T2345 commit 8c4ae9f14d6abdca41a4f01b438310501ecb6259 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Tue Jun 29 16:00:01 2021 +0200 journal_client: Compute next position for origin visit For origin without any last_update information [1], the journal client is now also in charge of moving their next position in the queue for rescheduling. Depending on their status, the next position offset and next_visit_queue_position are updated after each visit completes: - if the visit has failed, increase the next visit target by the minimal visit interval (to take into account transient loading issues) - if the visit is successful, and records some changes, decrease the visit interval index by 2 (visit the origin *way* more often). - if the visit is successful, and records no changes, increase the visit interval index by 1 (visit the origin less often). We then set the next visit target to its current value + the new visit interval multiplied by a random fudge factor (picked in the -/+ 10% range). The fudge factor allows the visits to spread out, avoiding "bursts" of loaded origins e.g. when a number of origins from a single hoster are processed at once. Note that the computations happen for all origins for simplicity and code maintenance but it will only be used by a new soon-to-be scheduling policy. [1] Lister cannot provide it for some reason. commit cb1edf1ab24d1c8db5821578a7fb2633fab50ff4 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jun 23 18:07:59 2021 +0200 Introduce storage for the recurrent visit scheduler queue position commit ec6e69f6415a007611c46f25e7c48e909a793d53 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jun 23 16:42:26 2021 +0200 Start handling of recurrent loading tasks in scheduler This deals first and foremost with the next_position_offset update done by the scheduler journal client. commit c486b28ece7c0b127fea10bbb4d7f5d1ad5c50ba Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Tue Jun 29 14:41:07 2021 +0200 journal_client: Explicit docstring commit 98f99b9fd457820dc2d4b5dab7e89cb8261a34a4 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jun 23 16:39:40 2021 +0200 journal_client: Only check last_* fields for some permutation tests In a future commit, we will add new fields whose values will be permutation dependent.
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/410/ for more details.
I got the idea but I am wondering if it would not be simpler to store the list of visit statuses directly in the model ?
This would allow finer processing for the scheduling policies, but this will be more costly to store.
swh/scheduler/journal_client.py | ||
---|---|---|
211 | recent |
swh/scheduler/journal_client.py | ||
---|---|---|
211 | lol i rebased a diff from david but i kept the typo, nice or what! |
Build is green
Patch application report for D5978 (id=21544)
Could not rebase; Attempt merge onto 1006f0aee4...
Updating 1006f0a..5c7bd83 Fast-forward sql/updates/29.sql | 31 +++ swh/scheduler/backend.py | 41 ++- swh/scheduler/interface.py | 19 ++ swh/scheduler/journal_client.py | 168 ++++++++++- swh/scheduler/model.py | 10 + swh/scheduler/sql/30-schema.sql | 23 +- swh/scheduler/tests/test_api_client.py | 2 + swh/scheduler/tests/test_journal_client.py | 429 ++++++++++++++++++++--------- swh/scheduler/tests/test_scheduler.py | 173 +++++++++++- 9 files changed, 744 insertions(+), 152 deletions(-) create mode 100644 sql/updates/29.sql
Changes applied before test
commit 5c7bd832476083020f2ac71c4d3aaf415a579b50 Author: David Douard <david.douard@sdfa3.org> Date: Wed Jul 7 16:55:57 2021 +0200 Add a successive_visits counter to OriginVisitStats which (tries to) maintain the number of successive visits resulting in the same status. For example, if the last 3 visits are successful evenful ones, this field value will be 3. Then, an uneventful visit coming right after the latest eventful one will reset this counter to one (last event is "uneventful" and only one such event happened successively). This will help when we will implement the disabling of too many failed visit attempt on a given origin Related to T2345 commit b02db7ce6222feeb5db7a7aff83a11c3a3697bd3 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Thu Jul 1 12:18:49 2021 +0200 Introduce new scheduling policy to grab origins without last update This is in charge of scheduling origins without last update. This also updates the global queue position so the journal client can initialize correctly the next position per origin and visit type. Related to T2345 commit 8c4ae9f14d6abdca41a4f01b438310501ecb6259 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Tue Jun 29 16:00:01 2021 +0200 journal_client: Compute next position for origin visit For origin without any last_update information [1], the journal client is now also in charge of moving their next position in the queue for rescheduling. Depending on their status, the next position offset and next_visit_queue_position are updated after each visit completes: - if the visit has failed, increase the next visit target by the minimal visit interval (to take into account transient loading issues) - if the visit is successful, and records some changes, decrease the visit interval index by 2 (visit the origin *way* more often). - if the visit is successful, and records no changes, increase the visit interval index by 1 (visit the origin less often). We then set the next visit target to its current value + the new visit interval multiplied by a random fudge factor (picked in the -/+ 10% range). The fudge factor allows the visits to spread out, avoiding "bursts" of loaded origins e.g. when a number of origins from a single hoster are processed at once. Note that the computations happen for all origins for simplicity and code maintenance but it will only be used by a new soon-to-be scheduling policy. [1] Lister cannot provide it for some reason. commit cb1edf1ab24d1c8db5821578a7fb2633fab50ff4 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jun 23 18:07:59 2021 +0200 Introduce storage for the recurrent visit scheduler queue position commit ec6e69f6415a007611c46f25e7c48e909a793d53 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jun 23 16:42:26 2021 +0200 Start handling of recurrent loading tasks in scheduler This deals first and foremost with the next_position_offset update done by the scheduler journal client. commit c486b28ece7c0b127fea10bbb4d7f5d1ad5c50ba Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Tue Jun 29 14:41:07 2021 +0200 journal_client: Explicit docstring commit 98f99b9fd457820dc2d4b5dab7e89cb8261a34a4 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jun 23 16:39:40 2021 +0200 journal_client: Only check last_* fields for some permutation tests In a future commit, we will add new fields whose values will be permutation dependent.
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/411/ for more details.
Build is green
Patch application report for D5978 (id=21545)
Could not rebase; Attempt merge onto 1006f0aee4...
Updating 1006f0a..4723b0c Fast-forward sql/updates/29.sql | 31 +++ swh/scheduler/backend.py | 41 ++- swh/scheduler/interface.py | 19 ++ swh/scheduler/journal_client.py | 168 ++++++++++- swh/scheduler/model.py | 10 + swh/scheduler/sql/30-schema.sql | 23 +- swh/scheduler/tests/test_api_client.py | 2 + swh/scheduler/tests/test_journal_client.py | 429 ++++++++++++++++++++--------- swh/scheduler/tests/test_scheduler.py | 173 +++++++++++- 9 files changed, 744 insertions(+), 152 deletions(-) create mode 100644 sql/updates/29.sql
Changes applied before test
commit 4723b0c0e08e7c2dd942881c3f3dad78ec143e5f Author: David Douard <david.douard@sdfa3.org> Date: Wed Jul 7 16:55:57 2021 +0200 Add a successive_visits counter to OriginVisitStats This maintains the number of successive visits resulting in the same status. For example, when the last 3 visits are successful evenful ones, this field value will be 3. Then, an uneventful visit coming right after the latest eventful one will reset this counter to 1 (last event is "uneventful" and only one such event happened successively)... This will help when implementing disabling of too many failed visit attempts on a given origin. Related to T2345 commit b02db7ce6222feeb5db7a7aff83a11c3a3697bd3 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Thu Jul 1 12:18:49 2021 +0200 Introduce new scheduling policy to grab origins without last update This is in charge of scheduling origins without last update. This also updates the global queue position so the journal client can initialize correctly the next position per origin and visit type. Related to T2345 commit 8c4ae9f14d6abdca41a4f01b438310501ecb6259 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Tue Jun 29 16:00:01 2021 +0200 journal_client: Compute next position for origin visit For origin without any last_update information [1], the journal client is now also in charge of moving their next position in the queue for rescheduling. Depending on their status, the next position offset and next_visit_queue_position are updated after each visit completes: - if the visit has failed, increase the next visit target by the minimal visit interval (to take into account transient loading issues) - if the visit is successful, and records some changes, decrease the visit interval index by 2 (visit the origin *way* more often). - if the visit is successful, and records no changes, increase the visit interval index by 1 (visit the origin less often). We then set the next visit target to its current value + the new visit interval multiplied by a random fudge factor (picked in the -/+ 10% range). The fudge factor allows the visits to spread out, avoiding "bursts" of loaded origins e.g. when a number of origins from a single hoster are processed at once. Note that the computations happen for all origins for simplicity and code maintenance but it will only be used by a new soon-to-be scheduling policy. [1] Lister cannot provide it for some reason. commit cb1edf1ab24d1c8db5821578a7fb2633fab50ff4 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jun 23 18:07:59 2021 +0200 Introduce storage for the recurrent visit scheduler queue position commit ec6e69f6415a007611c46f25e7c48e909a793d53 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jun 23 16:42:26 2021 +0200 Start handling of recurrent loading tasks in scheduler This deals first and foremost with the next_position_offset update done by the scheduler journal client. commit c486b28ece7c0b127fea10bbb4d7f5d1ad5c50ba Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Tue Jun 29 14:41:07 2021 +0200 journal_client: Explicit docstring commit 98f99b9fd457820dc2d4b5dab7e89cb8261a34a4 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jun 23 16:39:40 2021 +0200 journal_client: Only check last_* fields for some permutation tests In a future commit, we will add new fields whose values will be permutation dependent.
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/412/ for more details.
I got the idea but I am wondering if it would not be simpler to store the list of visit statuses directly in the model ?
It's in the model ;) (but not in ListedOrigin if that's what you meant ;)
This would allow finer processing for the scheduling policies, but this will be more costly to store.
Hum, yes, i think that will be too much to store and i'm not sure what the other benefits would be
aside from the current need [1]. Also, that duplicates the information we already have in the archive?
[1] Right now, the need is a simple one, a counter to ease determine if too many failed
attempts occurred or not. If so, disable the origin in the model here (it should come soon, before the end of the week, i'd say ;)
Ack, I guess you will find the visit status associated to the counter based on the dates saved in the stats model ?
Build has FAILED
Patch application report for D5978 (id=21551)
Could not rebase; Attempt merge onto 1006f0aee4...
Updating 1006f0a..196254c Fast-forward sql/updates/29.sql | 31 ++ swh/scheduler/backend.py | 41 ++- swh/scheduler/interface.py | 19 ++ swh/scheduler/journal_client.py | 183 ++++++++++- swh/scheduler/model.py | 10 + swh/scheduler/sql/30-schema.sql | 23 +- swh/scheduler/tests/test_api_client.py | 2 + swh/scheduler/tests/test_journal_client.py | 473 +++++++++++++++++++++-------- swh/scheduler/tests/test_scheduler.py | 173 ++++++++++- 9 files changed, 802 insertions(+), 153 deletions(-) create mode 100644 sql/updates/29.sql
Changes applied before test
commit 196254ce6b8e974d73ee7d3285dcd5f1ed32c017 Author: David Douard <david.douard@sdfa3.org> Date: Wed Jul 7 16:55:57 2021 +0200 Add a successive_visits counter to origin visit stats This maintains the number of successive visits resulting in the same status. This will help when implementing the disabling of too many failed visit attempts for a given origin. Related to T2345 commit b02db7ce6222feeb5db7a7aff83a11c3a3697bd3 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Thu Jul 1 12:18:49 2021 +0200 Introduce new scheduling policy to grab origins without last update This is in charge of scheduling origins without last update. This also updates the global queue position so the journal client can initialize correctly the next position per origin and visit type. Related to T2345 commit 8c4ae9f14d6abdca41a4f01b438310501ecb6259 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Tue Jun 29 16:00:01 2021 +0200 journal_client: Compute next position for origin visit For origin without any last_update information [1], the journal client is now also in charge of moving their next position in the queue for rescheduling. Depending on their status, the next position offset and next_visit_queue_position are updated after each visit completes: - if the visit has failed, increase the next visit target by the minimal visit interval (to take into account transient loading issues) - if the visit is successful, and records some changes, decrease the visit interval index by 2 (visit the origin *way* more often). - if the visit is successful, and records no changes, increase the visit interval index by 1 (visit the origin less often). We then set the next visit target to its current value + the new visit interval multiplied by a random fudge factor (picked in the -/+ 10% range). The fudge factor allows the visits to spread out, avoiding "bursts" of loaded origins e.g. when a number of origins from a single hoster are processed at once. Note that the computations happen for all origins for simplicity and code maintenance but it will only be used by a new soon-to-be scheduling policy. [1] Lister cannot provide it for some reason. commit cb1edf1ab24d1c8db5821578a7fb2633fab50ff4 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jun 23 18:07:59 2021 +0200 Introduce storage for the recurrent visit scheduler queue position commit ec6e69f6415a007611c46f25e7c48e909a793d53 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jun 23 16:42:26 2021 +0200 Start handling of recurrent loading tasks in scheduler This deals first and foremost with the next_position_offset update done by the scheduler journal client. commit c486b28ece7c0b127fea10bbb4d7f5d1ad5c50ba Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Tue Jun 29 14:41:07 2021 +0200 journal_client: Explicit docstring commit 98f99b9fd457820dc2d4b5dab7e89cb8261a34a4 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jun 23 16:39:40 2021 +0200 journal_client: Only check last_* fields for some permutation tests In a future commit, we will add new fields whose values will be permutation dependent.
Link to build: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/413/
See console output for more information: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/413/console
Build is green
Patch application report for D5978 (id=21552)
Could not rebase; Attempt merge onto 1006f0aee4...
Updating 1006f0a..64aa445 Fast-forward sql/updates/29.sql | 31 ++ swh/scheduler/backend.py | 41 ++- swh/scheduler/interface.py | 19 ++ swh/scheduler/journal_client.py | 181 ++++++++++- swh/scheduler/model.py | 10 + swh/scheduler/sql/30-schema.sql | 23 +- swh/scheduler/tests/test_api_client.py | 2 + swh/scheduler/tests/test_journal_client.py | 473 +++++++++++++++++++++-------- swh/scheduler/tests/test_scheduler.py | 173 ++++++++++- 9 files changed, 801 insertions(+), 152 deletions(-) create mode 100644 sql/updates/29.sql
Changes applied before test
commit 64aa4458ddd2bb5bd9a913c17950732d962129e6 Author: David Douard <david.douard@sdfa3.org> Date: Wed Jul 7 16:55:57 2021 +0200 Add a successive_visits counter to origin visit stats This maintains the number of successive visits resulting in the same status. This will help when implementing the disabling of too many failed visit attempts for a given origin. Related to T2345 commit b02db7ce6222feeb5db7a7aff83a11c3a3697bd3 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Thu Jul 1 12:18:49 2021 +0200 Introduce new scheduling policy to grab origins without last update This is in charge of scheduling origins without last update. This also updates the global queue position so the journal client can initialize correctly the next position per origin and visit type. Related to T2345 commit 8c4ae9f14d6abdca41a4f01b438310501ecb6259 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Tue Jun 29 16:00:01 2021 +0200 journal_client: Compute next position for origin visit For origin without any last_update information [1], the journal client is now also in charge of moving their next position in the queue for rescheduling. Depending on their status, the next position offset and next_visit_queue_position are updated after each visit completes: - if the visit has failed, increase the next visit target by the minimal visit interval (to take into account transient loading issues) - if the visit is successful, and records some changes, decrease the visit interval index by 2 (visit the origin *way* more often). - if the visit is successful, and records no changes, increase the visit interval index by 1 (visit the origin less often). We then set the next visit target to its current value + the new visit interval multiplied by a random fudge factor (picked in the -/+ 10% range). The fudge factor allows the visits to spread out, avoiding "bursts" of loaded origins e.g. when a number of origins from a single hoster are processed at once. Note that the computations happen for all origins for simplicity and code maintenance but it will only be used by a new soon-to-be scheduling policy. [1] Lister cannot provide it for some reason. commit cb1edf1ab24d1c8db5821578a7fb2633fab50ff4 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jun 23 18:07:59 2021 +0200 Introduce storage for the recurrent visit scheduler queue position commit ec6e69f6415a007611c46f25e7c48e909a793d53 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jun 23 16:42:26 2021 +0200 Start handling of recurrent loading tasks in scheduler This deals first and foremost with the next_position_offset update done by the scheduler journal client. commit c486b28ece7c0b127fea10bbb4d7f5d1ad5c50ba Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Tue Jun 29 14:41:07 2021 +0200 journal_client: Explicit docstring commit 98f99b9fd457820dc2d4b5dab7e89cb8261a34a4 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jun 23 16:39:40 2021 +0200 journal_client: Only check last_* fields for some permutation tests In a future commit, we will add new fields whose values will be permutation dependent.
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/414/ for more details.
... it will come before the end of the week
Ack, I guess you will find the visit status associated to the counter based on the dates saved in the stats model ?
yes, something like that ;)
done in D5980 ;)
Build is green
Patch application report for D5978 (id=21881)
Rebasing onto 4fa29fe128...
Current branch diff-target is up to date.
Changes applied before test
commit cdc2af4733752ddccea01bdbd70b9805fbdaf6f1 Author: David Douard <david.douard@sdfa3.org> Date: Wed Jul 7 16:55:57 2021 +0200 Add a successive_visits counter to origin visit stats This maintains the number of successive visits resulting in the same status. This will help implementing disabling of too many successive failed or not_found visits for a given origin. Related to T2345
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/420/ for more details.
Build is green
Patch application report for D5978 (id=21885)
Rebasing onto 4fa29fe128...
First, rewinding head to replay your work on top of it... Fast-forwarded diff-target to base-revision-422-D5978.
Changes applied before test
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/422/ for more details.
Build is green
Patch application report for D5978 (id=21888)
Rebasing onto 4fa29fe128...
Current branch diff-target is up to date.
Changes applied before test
commit d616934db615e6f53ea89c629a6b660bd24176e4 Author: David Douard <david.douard@sdfa3.org> Date: Wed Jul 7 16:55:57 2021 +0200 Add a successive_visits counter to origin visit stats This maintains the number of successive visits resulting in the same status. This will help implementing disabling of too many successive failed or not_found visits for a given origin. Related to T2345
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/425/ for more details.
Build is green
Patch application report for D5978 (id=21889)
Rebasing onto 4fa29fe128...
Current branch diff-target is up to date.
Changes applied before test
commit 015d16158df9a87cdea29d76a55381d6798ee4e3 Author: David Douard <david.douard@sdfa3.org> Date: Wed Jul 7 16:55:57 2021 +0200 Add a successive_visits counter to origin visit stats This maintains the number of successive visits resulting in the same status. This will help implementing disabling of too many successive failed or not_found visits for a given origin. Related to T2345
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/426/ for more details.
Informally, if there is a single visit with a given status, then the number of successive visits with that status is 1; but with the semantics defined here, it's 0. This seems confusing to me; could you make counters start at 1 (unless there are no visits at all, in which case 0 is correct)
swh/scheduler/journal_client.py | ||
---|---|---|
222–237 | I find this more readable | |
259 | if increment_successive_visits == False resets the counter, it's probably not the best variable name. What about same_visit_status? |
Informally, if there is a single visit with a given status, then the number of
successive visits with that status is 1; but with the semantics defined here, it's 0.
This seems confusing to me; could you make counters start at 1 (unless there are no
visits at all, in which case 0 is correct)
yes, I sense it is confusing. Although starting the counter at 1 then resets it at 0
after when we switch to a new status is what's confusing to me...
Shouldn't we make that counter always start at 1 and then be reset to 1 instead?
My trail of thought is that if we have an entry in origin_visit_stats it means there is
at least one visit so 1 sound fine to me (it's also in accord with the start of your
sentence).
thoughts?
That's what I said. It's 0 before we have any visit, but as soon as we have a visit, it will never go back to 0 again.
My trail of thought is that if we have an entry in origin_visit_stats it means there is
at least one visit so 1 sound fine to me (it's also in accord with the start of your
sentence).
Then 0 is never recorded in the DB, fine with me.
Adapt according to review:
- start increment to 1 then adapt tests
- rename variable to a more meaningful name
Build is green
Patch application report for D5978 (id=21894)
Rebasing onto 4fa29fe128...
Current branch diff-target is up to date.
Changes applied before test
commit 1bcf84d5e66d02c006698a89d2571911d3fd0764 Author: David Douard <david.douard@sdfa3.org> Date: Wed Jul 7 16:55:57 2021 +0200 Add a successive_visits counter to origin visit stats This maintains the number of successive visits resulting in the same status. This will help implementing disabling of too many successive failed or not_found visits for a given origin. Related to T2345
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/428/ for more details.