Page MenuHomeSoftware Heritage

Use billiard instead of stdlib multiprocessing
ClosedPublic

Authored by Alphare on Apr 27 2021, 5:29 PM.

Details

Summary

This circumvents a few celery-related issues, and is consistent with
what the rest of the codebase does.

Diff Detail

Repository
rDLDHG Mercurial loader
Branch
corruption
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 21124
Build 32782: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 32781: arc lint + arc unit

Event Timeline

Build has FAILED

Patch application report for D5623 (id=20063)

Rebasing onto f03f274065...

Current branch diff-target is up to date.
Changes applied before test
commit 1d8b26c042011f3271e451b11b670f37b44e8685
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Tue Apr 27 10:53:56 2021 +0200

    Use billiard instead of stdlib multiprocessing
    
    This circumvents a few celery-related issues, and is consistent with
    what the rest of the codebase does.

Link to build: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/204/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/204/console

Harbormaster returned this revision to the author for changes because remote builds failed.Apr 27 2021, 5:31 PM
Harbormaster failed remote builds in B21056: Diff 20063!

Fix billiard-related issues and API differences

Build is green

Patch application report for D5623 (id=20077)

Rebasing onto f03f274065...

Current branch diff-target is up to date.
Changes applied before test
commit a89783c52f2e3c44e08018e0c5fa99f54471f994
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Tue Apr 27 10:53:56 2021 +0200

    Use billiard instead of stdlib multiprocessing
    
    This circumvents a few celery-related issues, and is consistent with
    what the rest of the codebase does.

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/209/ for more details.

vlorentz added inline comments.
swh/loader/mercurial/hgutil.py
90–96 ↗(On Diff #20077)

What about this, to save some time sleeping?

Integrate suggestion to spend less time sleeping

Build is green

Patch application report for D5623 (id=20111)

Rebasing onto 504ee123ce...

First, rewinding head to replay your work on top of it...
Applying: Use billiard instead of stdlib multiprocessing
Changes applied before test
commit b59f0fdf4c99884d8d76dd99f3426706c02744b0
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Tue Apr 27 10:53:56 2021 +0200

    Use billiard instead of stdlib multiprocessing
    
    This circumvents a few celery-related issues, and is consistent with
    what the rest of the codebase does.

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/214/ for more details.

This circumvents a few celery-related issues, and is consistent with
what the rest of the codebase does.

A small synthesis of those few related issues [1] you mentioned in the commit (and thus the diff description) would be better.
That way, we, reviewers, don't have to look it up ;)

I agree it's not completely DRY but still, it makes the review faster thus integrating your code faster ;)

Integrate suggestion to spend less time sleeping

I call your bluff ;)
Jokes aside, I don't see what changed in between [2]

[1] I see only the one process.join issue mentioned in the code comment.

[2] https://forge.softwareheritage.org/D5623?vs=20077&id=20111#toc

Cheers,

Actually incorporate the proposed changes + improve commit message

Looks like *I* needed *more* sleep, apparently failed a git rebase somehow.

Build is green

Patch application report for D5623 (id=20133)

Could not rebase; Attempt merge onto 504ee123ce...

Merge made by the 'recursive' strategy.
 swh/loader/mercurial/from_disk.py         | 27 +++++++++++++++++++++++----
 swh/loader/mercurial/hgutil.py            | 24 ++++++++++++++++++------
 swh/loader/mercurial/tasks.py             | 10 +++++-----
 swh/loader/mercurial/tests/test_hgutil.py | 11 +++++++----
 swh/loader/mercurial/tests/test_tasks.py  |  4 ++--
 swh/loader/mercurial/utils.py             |  3 ++-
 6 files changed, 57 insertions(+), 22 deletions(-)
Changes applied before test
commit cbf9b70ec55e2b3dd6f8ba31b3460f63d80c8de1
Merge: 504ee12 4302b1f
Author: Jenkins user <jenkins@localhost>
Date:   Thu Apr 29 09:23:27 2021 +0000

    Merge branch 'diff-target' into HEAD

commit 4302b1f9d53597544149beaff1e9400a4f22b2a2
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Tue Apr 27 10:55:16 2021 +0200

    Replace old loader with the new one
    
    This is the minimal amount of code needed to switch from the old one to
    the new one. If the new loader proves to be good enough, we may remove
    the old one entirely.

commit 0f8f2fa223229a4139bce0ebf4b6b8d33ad5980e
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Mon Apr 26 23:33:20 2021 +0200

    Handle more cases of corruption
    
    Some corrupted repos have missing files or broken logical links in the
    underlying Mercurial datastructure, which means that say sometimes fail
    for a given revision. This does not mean we should throw away the rest
    of the repository. (Tested on repos of various levels and flavors of
    corruption in the Boatbucket archive)

commit 1e5da8f01deea8e5c639691dd80a2c4ce32e67da
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Mon Apr 26 23:28:50 2021 +0200

    Ignore the repository's config
    
    `HGRCPATH` only tells Mercurial to ignore the user's config files, but
    some repositories have a `.hg/hgrc` file (only in the case that you copy
    the files instead of cloning, if present) that is usually used for server-side
    configuration. We want to ignore this, since it might affect loading
    and ask for hooks that are not there or are otherwise annoying/dangerous,
    for example.

commit 1e72195955b9cf0983a687f497364077085dee78
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Mon Apr 26 23:26:09 2021 +0200

    Also use minimal env in the new Mercurial loader
    
    The old loader (bundle2 loader) already received this treatment which
    ensures Mercurial doesn't pick up on any user customization, but I
    apparently forgot to apply the same changes to the new one.

commit 8bd6a9859f46aa501d7c0d11dbf63a4b96f0aeb9
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Tue Apr 27 10:53:56 2021 +0200

    Use billiard instead of stdlib multiprocessing
    
    This circumvents a few celery-related issues, and is consistent with
    what the rest of the codebase does.
    
    stdlib multiprocessing is not able to spawn children from daemonic
    processes, and even says so plainly if you try:
    
    `AssertionError: daemonic processes are not allowed to have children`
    
    This is incompatible with the SWH infrastructure which needs to do this
    exactly. Fortunately, we're already using billiard and celery. I'm
    assuming that there could be other blocking or annoying differences
    between stdlib and billiard, but we will save ourselves the trouble of
    finding out.

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/215/ for more details.

Actually incorporate the proposed changes + improve commit message

Looks like *I* needed *more* sleep, apparently failed a git rebase somehow.

"oops", it's the wrong commit range you used to update the diff (probably)
I see the content of the other diff which swaps the bundle/from_disk loader.

/me suggests gently to take some nap ;)

;D

Actually send the right changes...

...this would be embarassing if I were a VCS developer!

Build is green

Patch application report for D5623 (id=20135)

Rebasing onto 504ee123ce...

First, rewinding head to replay your work on top of it...
Applying: Use billiard instead of stdlib multiprocessing
Changes applied before test
commit 4bb56942bd2bb0dbe4f60973063bc74dab3ae8e4
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Tue Apr 27 10:53:56 2021 +0200

    Use billiard instead of stdlib multiprocessing
    
    This circumvents a few celery-related issues, and is consistent with
    what the rest of the codebase does.
    
    stdlib multiprocessing is not able to spawn children from daemonic
    processes, and even says so plainly if you try:
    
    `AssertionError: daemonic processes are not allowed to have children`
    
    This is incompatible with the SWH infrastructure which needs to do this
    exactly. Fortunately, we're already using billiard and celery. I'm
    assuming that there could be other blocking or annoying differences
    between stdlib and billiard, but we will save ourselves the trouble of
    finding out.

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/216/ for more details.

This revision is now accepted and ready to land.Apr 29 2021, 11:58 AM

Build is green

Patch application report for D5623 (id=20169)

Rebasing onto 504ee123ce...

Current branch diff-target is up to date.
Changes applied before test
commit 23260277700970ffabcfaa386e3ce5c619c8294f
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Tue Apr 27 10:53:56 2021 +0200

    Use billiard instead of stdlib multiprocessing
    
    This circumvents a few celery-related issues, and is consistent with
    what the rest of the codebase does.
    
    stdlib multiprocessing is not able to spawn children from daemonic
    processes, and even says so plainly if you try:
    
    `AssertionError: daemonic processes are not allowed to have children`
    
    This is incompatible with the SWH infrastructure which needs to do this
    exactly. Fortunately, we're already using billiard and celery. I'm
    assuming that there could be other blocking or annoying differences
    between stdlib and billiard, but we will save ourselves the trouble of
    finding out.

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/217/ for more details.