This circumvents a few celery-related issues, and is consistent with
what the rest of the codebase does.
Details
- Reviewers
ardumont - Group Reviewers
Reviewers - Commits
- rDLDHG232602777009: Use billiard instead of stdlib multiprocessing
Diff Detail
- Repository
- rDLDHG Mercurial loader
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 21125 Build 32784: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 32783: arc lint + arc unit
Event Timeline
Build has FAILED
Patch application report for D5623 (id=20063)
Rebasing onto f03f274065...
Current branch diff-target is up to date.
Changes applied before test
commit 1d8b26c042011f3271e451b11b670f37b44e8685 Author: Raphaël Gomès <rgomes@octobus.net> Date: Tue Apr 27 10:53:56 2021 +0200 Use billiard instead of stdlib multiprocessing This circumvents a few celery-related issues, and is consistent with what the rest of the codebase does.
Link to build: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/204/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/204/console
Build is green
Patch application report for D5623 (id=20077)
Rebasing onto f03f274065...
Current branch diff-target is up to date.
Changes applied before test
commit a89783c52f2e3c44e08018e0c5fa99f54471f994 Author: Raphaël Gomès <rgomes@octobus.net> Date: Tue Apr 27 10:53:56 2021 +0200 Use billiard instead of stdlib multiprocessing This circumvents a few celery-related issues, and is consistent with what the rest of the codebase does.
See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/209/ for more details.
swh/loader/mercurial/hgutil.py | ||
---|---|---|
89–102 | What about this, to save some time sleeping? |
Build is green
Patch application report for D5623 (id=20111)
Rebasing onto 504ee123ce...
First, rewinding head to replay your work on top of it... Applying: Use billiard instead of stdlib multiprocessing
Changes applied before test
commit b59f0fdf4c99884d8d76dd99f3426706c02744b0 Author: Raphaël Gomès <rgomes@octobus.net> Date: Tue Apr 27 10:53:56 2021 +0200 Use billiard instead of stdlib multiprocessing This circumvents a few celery-related issues, and is consistent with what the rest of the codebase does.
See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/214/ for more details.
This circumvents a few celery-related issues, and is consistent with
what the rest of the codebase does.
A small synthesis of those few related issues [1] you mentioned in the commit (and thus the diff description) would be better.
That way, we, reviewers, don't have to look it up ;)
I agree it's not completely DRY but still, it makes the review faster thus integrating your code faster ;)
Integrate suggestion to spend less time sleeping
I call your bluff ;)
Jokes aside, I don't see what changed in between [2]
[1] I see only the one process.join issue mentioned in the code comment.
[2] https://forge.softwareheritage.org/D5623?vs=20077&id=20111#toc
Cheers,
Actually incorporate the proposed changes + improve commit message
Looks like *I* needed *more* sleep, apparently failed a git rebase somehow.
Build is green
Patch application report for D5623 (id=20133)
Could not rebase; Attempt merge onto 504ee123ce...
Merge made by the 'recursive' strategy. swh/loader/mercurial/from_disk.py | 27 +++++++++++++++++++++++---- swh/loader/mercurial/hgutil.py | 24 ++++++++++++++++++------ swh/loader/mercurial/tasks.py | 10 +++++----- swh/loader/mercurial/tests/test_hgutil.py | 11 +++++++---- swh/loader/mercurial/tests/test_tasks.py | 4 ++-- swh/loader/mercurial/utils.py | 3 ++- 6 files changed, 57 insertions(+), 22 deletions(-)
Changes applied before test
commit cbf9b70ec55e2b3dd6f8ba31b3460f63d80c8de1 Merge: 504ee12 4302b1f Author: Jenkins user <jenkins@localhost> Date: Thu Apr 29 09:23:27 2021 +0000 Merge branch 'diff-target' into HEAD commit 4302b1f9d53597544149beaff1e9400a4f22b2a2 Author: Raphaël Gomès <rgomes@octobus.net> Date: Tue Apr 27 10:55:16 2021 +0200 Replace old loader with the new one This is the minimal amount of code needed to switch from the old one to the new one. If the new loader proves to be good enough, we may remove the old one entirely. commit 0f8f2fa223229a4139bce0ebf4b6b8d33ad5980e Author: Raphaël Gomès <rgomes@octobus.net> Date: Mon Apr 26 23:33:20 2021 +0200 Handle more cases of corruption Some corrupted repos have missing files or broken logical links in the underlying Mercurial datastructure, which means that say sometimes fail for a given revision. This does not mean we should throw away the rest of the repository. (Tested on repos of various levels and flavors of corruption in the Boatbucket archive) commit 1e5da8f01deea8e5c639691dd80a2c4ce32e67da Author: Raphaël Gomès <rgomes@octobus.net> Date: Mon Apr 26 23:28:50 2021 +0200 Ignore the repository's config `HGRCPATH` only tells Mercurial to ignore the user's config files, but some repositories have a `.hg/hgrc` file (only in the case that you copy the files instead of cloning, if present) that is usually used for server-side configuration. We want to ignore this, since it might affect loading and ask for hooks that are not there or are otherwise annoying/dangerous, for example. commit 1e72195955b9cf0983a687f497364077085dee78 Author: Raphaël Gomès <rgomes@octobus.net> Date: Mon Apr 26 23:26:09 2021 +0200 Also use minimal env in the new Mercurial loader The old loader (bundle2 loader) already received this treatment which ensures Mercurial doesn't pick up on any user customization, but I apparently forgot to apply the same changes to the new one. commit 8bd6a9859f46aa501d7c0d11dbf63a4b96f0aeb9 Author: Raphaël Gomès <rgomes@octobus.net> Date: Tue Apr 27 10:53:56 2021 +0200 Use billiard instead of stdlib multiprocessing This circumvents a few celery-related issues, and is consistent with what the rest of the codebase does. stdlib multiprocessing is not able to spawn children from daemonic processes, and even says so plainly if you try: `AssertionError: daemonic processes are not allowed to have children` This is incompatible with the SWH infrastructure which needs to do this exactly. Fortunately, we're already using billiard and celery. I'm assuming that there could be other blocking or annoying differences between stdlib and billiard, but we will save ourselves the trouble of finding out.
See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/215/ for more details.
Actually incorporate the proposed changes + improve commit message
Looks like *I* needed *more* sleep, apparently failed a git rebase somehow.
"oops", it's the wrong commit range you used to update the diff (probably)
I see the content of the other diff which swaps the bundle/from_disk loader.
/me suggests gently to take some nap ;)
;D
Actually send the right changes...
...this would be embarassing if I were a VCS developer!
Build is green
Patch application report for D5623 (id=20135)
Rebasing onto 504ee123ce...
First, rewinding head to replay your work on top of it... Applying: Use billiard instead of stdlib multiprocessing
Changes applied before test
commit 4bb56942bd2bb0dbe4f60973063bc74dab3ae8e4 Author: Raphaël Gomès <rgomes@octobus.net> Date: Tue Apr 27 10:53:56 2021 +0200 Use billiard instead of stdlib multiprocessing This circumvents a few celery-related issues, and is consistent with what the rest of the codebase does. stdlib multiprocessing is not able to spawn children from daemonic processes, and even says so plainly if you try: `AssertionError: daemonic processes are not allowed to have children` This is incompatible with the SWH infrastructure which needs to do this exactly. Fortunately, we're already using billiard and celery. I'm assuming that there could be other blocking or annoying differences between stdlib and billiard, but we will save ourselves the trouble of finding out.
See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/216/ for more details.
Build is green
Patch application report for D5623 (id=20169)
Rebasing onto 504ee123ce...
Current branch diff-target is up to date.
Changes applied before test
commit 23260277700970ffabcfaa386e3ce5c619c8294f Author: Raphaël Gomès <rgomes@octobus.net> Date: Tue Apr 27 10:53:56 2021 +0200 Use billiard instead of stdlib multiprocessing This circumvents a few celery-related issues, and is consistent with what the rest of the codebase does. stdlib multiprocessing is not able to spawn children from daemonic processes, and even says so plainly if you try: `AssertionError: daemonic processes are not allowed to have children` This is incompatible with the SWH infrastructure which needs to do this exactly. Fortunately, we're already using billiard and celery. I'm assuming that there could be other blocking or annoying differences between stdlib and billiard, but we will save ourselves the trouble of finding out.
See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/217/ for more details.