Details
- Reviewers
anlambert - Group Reviewers
Reviewers - Maniphest Tasks
- T3781: Replace the Nixguix loader with a lister
- Commits
- rDLDBASEa5255f106453: {Content|Directory}Loader: Register tasks
Diff Detail
- Repository
- rDLDBASE Generic VCS/Package Loader
- Branch
- refactor-nixguix-loader-directory
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 32045 Build 50167: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 50166: arc lint + arc unit
Event Timeline
Build is green
Patch application report for D8601 (id=31059)
Could not rebase; Attempt merge onto dbf7f3dca0...
Updating dbf7f3d..ccb8f36 Fast-forward docs/README.rst | 7 ++- setup.py | 2 + swh/loader/core/__init__.py | 27 ++++++++++ swh/loader/core/loader.py | 73 +++++++++++++------------ swh/loader/core/tasks.py | 20 +++++++ swh/loader/core/tests/test_loader.py | 100 +++++++++++++++++++++++++---------- swh/loader/core/tests/test_tasks.py | 34 ++++++++++++ 7 files changed, 196 insertions(+), 67 deletions(-) create mode 100644 swh/loader/core/tasks.py create mode 100644 swh/loader/core/tests/test_tasks.py
Changes applied before test
commit ccb8f36834bd9a7463f7182321a4adfc7915c26f Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Sun Oct 2 19:18:23 2022 +0200 {Content|Directory}Loader: Register tasks Related to T3781 commit 39c33a66c27c030c7ae3cfba4a393c2ced468fbc Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Fri Sep 30 15:10:49 2022 +0200 {Content|Directory}Loader: Adapt support for checksums This adapts the content/directory loader implementations to use directly a checksums dict which is now sent by the listers. This improves the loader to check those checksums when retrieving the artifact (content or tarball). Thanks to a bump in the swh.model version, this is now able to deal with sha512 checksums checks as well. This also aligns with the current package loaders which now are also checking the integrity of the tarballs they ingest. Related to T3781
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/944/ for more details.
I recalled it is also important to test the loader parameters when created from ListedOrigin instances so I have added this test below:
diff --git a/swh/loader/core/tests/test_tasks.py b/swh/loader/core/tests/test_tasks.py index bf8d2e1..c0ab361 100644 --- a/swh/loader/core/tests/test_tasks.py +++ b/swh/loader/core/tests/test_tasks.py @@ -3,8 +3,13 @@ # License: GNU General Public License version 3, or any later version # See top-level LICENSE file for more information +import uuid + import pytest +from swh.scheduler.model import ListedOrigin, Lister +from swh.scheduler.utils import create_origin_task_dict + NAMESPACE = "swh.loader.core" @@ -32,3 +37,44 @@ def test_tasks_content_loader( assert res.successful() assert mock_load.called assert res.result == {"status": "eventful"} + + +@pytest.fixture +def nixguix_lister(): + return Lister(name="nixguix", instance_name="example", id=uuid.uuid4()) + + +@pytest.mark.parametrize("loader_name", ["Content", "Directory"]) +def test_tasks_loader_for_listed_origin( + mocker, + swh_scheduler_celery_app, + swh_scheduler_celery_worker, + swh_config, + nixguix_lister, + loader_name, +): + mock_load = mocker.patch(f"{NAMESPACE}.loader.{loader_name}Loader.load") + mock_load.return_value = {"status": "eventful"} + + listed_origin = ListedOrigin( + lister_id=nixguix_lister.id, + url="https://example.org/archives", + visit_type=loader_name.lower(), + extra_loader_arguments={ + "origin_url": "https://example.org/artifact/artifact", + "fallback_urls": ["https://example.org/mirror/artifact-0.0.1.pkg.xz"], + "checksums": {"sha256": "some-valid-checksum"}, + }, + ) + + task_dict = create_origin_task_dict(listed_origin, nixguix_lister) + + res = swh_scheduler_celery_app.send_task( + f"{NAMESPACE}.tasks.Load{loader_name}", + kwargs=task_dict["arguments"]["kwargs"], + ) + assert res + res.wait() + assert res.successful() + assert mock_load.called + assert res.result == {"status": "eventful"}
I got the following error when running tests:
[2022-10-03 15:08:28,803: ERROR/MainProcess] Task swh.loader.core.tasks.LoadContent[5da7f77a-d698-44d4-b6e5-4a3b2d72e852] raised unexpected: TypeError("__init__() got an unexpected keyword argument 'url'") Traceback (most recent call last): File "/home/anlambert/.virtualenvs/swh/lib/python3.9/site-packages/celery/app/trace.py", line 451, in trace_task R = retval = fun(*args, **kwargs) File "/home/anlambert/swh/swh-environment/swh-scheduler/swh/scheduler/task.py", line 61, in __call__ result = super().__call__(*args, **kwargs) File "/home/anlambert/.virtualenvs/swh/lib/python3.9/site-packages/celery/app/task.py", line 392, in __call__ return self.run(*args, **kwargs) File "/home/anlambert/swh/swh-environment/swh-loader-core/swh/loader/core/tasks.py", line 14, in load_content return ContentLoader.from_configfile(**kwargs).load() File "/home/anlambert/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 188, in from_configfile return cls.from_config(**config) File "/home/anlambert/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 174, in from_config return cls(storage=storage_instance, **config) File "/home/anlambert/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 713, in __init__ super().__init__(*args, **kwargs) File "/home/anlambert/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 675, in __init__ super().__init__(*args, **kwargs) TypeError: __init__() got an unexpected keyword argument 'url'
It seems some loader parameters should be renamed or made explicit or the loading task will fail to execute otherwise.
ah! yeah, it's the first time we are scheduling ListedOrigins for non package loaders...
Thanks for raising this up!
Build is green
Patch application report for D8601 (id=31075)
Rebasing onto c631349aea...
Current branch diff-target is up to date.
Changes applied before test
commit a5255f1064533efc5f9032eb66ad7f12c2468f9b Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Sun Oct 2 19:18:23 2022 +0200 {Content|Directory}Loader: Register tasks Related to T3781
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/951/ for more details.
Looks good to me, makes me think that I should deduplicate the tests for the tasks created from ListedOrigins in loaders through a fixture and add missing ones, I will push a diff for that.