Page MenuHomeSoftware Heritage

utils.split_range: Make computed ranges not overlap
ClosedPublic

Authored by ardumont on Sep 9 2020, 7:04 PM.

Details

Summary

Existing listers use the is_within_bound [1] method from the base lister.
This method uses inclusive boundaries in all cases.

As some "range" task listers [2] [3] are using split_range function to create
"overlapping" ranges, this can be the concurrent insert issue cause we found [4].

This commit adapts the function split_range to make the generated ranges no
longer overlap.

[1]
https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/core/lister_base.py$194-199

[2]
https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/gitlab/tasks.py$37-41

[3]
https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/gitea/tasks.py$36-41

[4] https://sentry.softwareheritage.org/share/issue/aec9c2af347e47ea84f51ace3bfe2f25/

Related to T2577

Test Plan

tox

Diff Detail

Repository
rDLS Listers
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D3899 (id=13743)

Could not rebase; Attempt merge onto 66a61f3dd2...

Updating 66a61f3..e407feb
Fast-forward
 swh/lister/tests/test_utils.py | 37 +++++++++++++++++++++----------------
 swh/lister/utils.py            |  8 +++++---
 2 files changed, 26 insertions(+), 19 deletions(-)
Changes applied before test
commit e407feb8c28b805db8ba220080d20f56d3287c50
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Sep 9 18:50:46 2020 +0200

    utils.split_range: Make computed ranges not overlap
    
    Existing listers use the `is_within_bound` [1] method from the base lister.
    This method uses inclusive boundaries in all cases.
    
    As some "range" task listers [2] [3] are using `split_range` function to create
    "overlapping" ranges. So, when those range overlap, this can create concurrent
    insert issues down the line.
    
    This commit adapts the function `split_range` to make the generated ranges no
    longer overlap.
    
    [1]
    https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/core/lister_base.py$194-199
    
    [2]
    https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/gitlab/tasks.py$37-41
    
    [3]
    https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/gitea/tasks.py$36-41
    
    Related to T2577

commit 725c1fe4ad076c71d51d0c2998dcb4aaedd4b6bb
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Sep 9 18:48:07 2020 +0200

    test_utils: Migrate to pytest

Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/41/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/41/console

  • Fix remaining task tests (on relister which uses ranges)
  • Add docstring on utils.split_range (with samples)

Build is green

Patch application report for D3899 (id=13745)

Could not rebase; Attempt merge onto 66a61f3dd2...

Updating 66a61f3..6b880bc
Fast-forward
 swh/lister/gitea/tests/test_tasks.py  | 33 +++++++++++++------------------
 swh/lister/gitlab/tests/test_tasks.py | 33 +++++++++++++------------------
 swh/lister/tests/test_utils.py        | 37 ++++++++++++++++++++---------------
 swh/lister/utils.py                   | 21 +++++++++++++++++---
 4 files changed, 67 insertions(+), 57 deletions(-)
Changes applied before test
commit 6b880bc00225ac05a94f001d6591ee394823a039
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Sep 9 18:50:46 2020 +0200

    utils.split_range: Split into not overlapping ranges
    
    Existing listers use the `is_within_bound` [1] method from the base lister.
    This method uses inclusive boundaries in all cases.
    
    As some "range" task listers [2] [3] are using `split_range` function to create
    "overlapping" ranges. So, when those range overlap, this can create concurrent
    insert issues down the line.
    
    This commit adapts the function `split_range` to make the generated ranges no
    longer overlap.
    
    [1]
    https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/core/lister_base.py$194-199
    
    [2]
    https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/gitlab/tasks.py$37-41
    
    [3]
    https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/gitea/tasks.py$36-41
    
    Related to T2577

commit 725c1fe4ad076c71d51d0c2998dcb4aaedd4b6bb
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Sep 9 18:48:07 2020 +0200

    test_utils: Migrate to pytest

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/42/ for more details.

ardumont added a project: Lister.

add --doctest-modules to the arguments of pytest in tox.ini

vsellier added a subscriber: vsellier.

Tested in the docker-environment, the problem is not reproduced anymore with 5 concurrent listers.

This revision is now accepted and ready to land.Sep 10 2020, 10:39 AM

add --doctest-modules to the arguments of pytest in tox.ini

ack, i'll do it in another diff as other unrelated part broke when adding the pytest --doctest-module flag.

Tested in the docker-environment, the problem is not reproduced anymore with 5 concurrent listers.

great ;)

ardumont edited the summary of this revision. (Show Details)
ardumont edited the test plan for this revision. (Show Details)

Rework commit message

Build is green

Patch application report for D3899 (id=13753)

Rebasing onto 725c1fe4ad...

Current branch diff-target is up to date.
Changes applied before test
commit e3c856b5eef574427533fdb682163087337b2d8c
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Sep 9 18:50:46 2020 +0200

    utils.split_range: Split into not overlapping ranges
    
    Existing listers use the `is_within_bound` [1] method from the base lister.
    This method uses inclusive boundaries in all cases.
    
    As some "range" task listers [2] [3] are using `split_range` function to create
    "overlapping" ranges, this can cause concurrent insert issues down the line [4].
    
    This commit adapts the function `split_range` to make the generated ranges no
    longer overlap.
    
    [1]
    https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/core/lister_base.py$194-199
    
    [2]
    https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/gitlab/tasks.py$37-41
    
    [3]
    https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/gitea/tasks.py$36-41
    
    Related to T2577

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/43/ for more details.