Page MenuHomeSoftware Heritage

{Content|Directory}Loader: Register tasks
ClosedPublic

Authored by ardumont on Oct 3 2022, 2:46 PM.

Diff Detail

Repository
rDLDBASE Generic VCS/Package Loader
Branch
refactor-nixguix-loader-directory
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 32045
Build 50167: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 50166: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D8601 (id=31059)

Could not rebase; Attempt merge onto dbf7f3dca0...

Updating dbf7f3d..ccb8f36
Fast-forward
 docs/README.rst                      |   7 ++-
 setup.py                             |   2 +
 swh/loader/core/__init__.py          |  27 ++++++++++
 swh/loader/core/loader.py            |  73 +++++++++++++------------
 swh/loader/core/tasks.py             |  20 +++++++
 swh/loader/core/tests/test_loader.py | 100 +++++++++++++++++++++++++----------
 swh/loader/core/tests/test_tasks.py  |  34 ++++++++++++
 7 files changed, 196 insertions(+), 67 deletions(-)
 create mode 100644 swh/loader/core/tasks.py
 create mode 100644 swh/loader/core/tests/test_tasks.py
Changes applied before test
commit ccb8f36834bd9a7463f7182321a4adfc7915c26f
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sun Oct 2 19:18:23 2022 +0200

    {Content|Directory}Loader: Register tasks
    
    Related to T3781

commit 39c33a66c27c030c7ae3cfba4a393c2ced468fbc
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Sep 30 15:10:49 2022 +0200

    {Content|Directory}Loader: Adapt support for checksums
    
    This adapts the content/directory loader implementations to use directly a checksums
    dict which is now sent by the listers.
    
    This improves the loader to check those checksums when retrieving the artifact (content
    or tarball). Thanks to a bump in the swh.model version, this is now able to deal with
    sha512 checksums checks as well.
    
    This also aligns with the current package loaders which now are also checking the
    integrity of the tarballs they ingest.
    
    Related to T3781

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/944/ for more details.

anlambert added a subscriber: anlambert.

I recalled it is also important to test the loader parameters when created from ListedOrigin instances so I have added this test below:

diff --git a/swh/loader/core/tests/test_tasks.py b/swh/loader/core/tests/test_tasks.py
index bf8d2e1..c0ab361 100644
--- a/swh/loader/core/tests/test_tasks.py
+++ b/swh/loader/core/tests/test_tasks.py
@@ -3,8 +3,13 @@
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
+import uuid
+
 import pytest
 
+from swh.scheduler.model import ListedOrigin, Lister
+from swh.scheduler.utils import create_origin_task_dict
+
 NAMESPACE = "swh.loader.core"
 
 
@@ -32,3 +37,44 @@ def test_tasks_content_loader(
     assert res.successful()
     assert mock_load.called
     assert res.result == {"status": "eventful"}
+
+
+@pytest.fixture
+def nixguix_lister():
+    return Lister(name="nixguix", instance_name="example", id=uuid.uuid4())
+
+
+@pytest.mark.parametrize("loader_name", ["Content", "Directory"])
+def test_tasks_loader_for_listed_origin(
+    mocker,
+    swh_scheduler_celery_app,
+    swh_scheduler_celery_worker,
+    swh_config,
+    nixguix_lister,
+    loader_name,
+):
+    mock_load = mocker.patch(f"{NAMESPACE}.loader.{loader_name}Loader.load")
+    mock_load.return_value = {"status": "eventful"}
+
+    listed_origin = ListedOrigin(
+        lister_id=nixguix_lister.id,
+        url="https://example.org/archives",
+        visit_type=loader_name.lower(),
+        extra_loader_arguments={
+            "origin_url": "https://example.org/artifact/artifact",
+            "fallback_urls": ["https://example.org/mirror/artifact-0.0.1.pkg.xz"],
+            "checksums": {"sha256": "some-valid-checksum"},
+        },
+    )
+
+    task_dict = create_origin_task_dict(listed_origin, nixguix_lister)
+
+    res = swh_scheduler_celery_app.send_task(
+        f"{NAMESPACE}.tasks.Load{loader_name}",
+        kwargs=task_dict["arguments"]["kwargs"],
+    )
+    assert res
+    res.wait()
+    assert res.successful()
+    assert mock_load.called
+    assert res.result == {"status": "eventful"}

I got the following error when running tests:

[2022-10-03 15:08:28,803: ERROR/MainProcess] Task swh.loader.core.tasks.LoadContent[5da7f77a-d698-44d4-b6e5-4a3b2d72e852] raised unexpected: TypeError("__init__() got an unexpected keyword argument 'url'")
Traceback (most recent call last):
  File "/home/anlambert/.virtualenvs/swh/lib/python3.9/site-packages/celery/app/trace.py", line 451, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/home/anlambert/swh/swh-environment/swh-scheduler/swh/scheduler/task.py", line 61, in __call__
    result = super().__call__(*args, **kwargs)
  File "/home/anlambert/.virtualenvs/swh/lib/python3.9/site-packages/celery/app/task.py", line 392, in __call__
    return self.run(*args, **kwargs)
  File "/home/anlambert/swh/swh-environment/swh-loader-core/swh/loader/core/tasks.py", line 14, in load_content
    return ContentLoader.from_configfile(**kwargs).load()
  File "/home/anlambert/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 188, in from_configfile
    return cls.from_config(**config)
  File "/home/anlambert/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 174, in from_config
    return cls(storage=storage_instance, **config)
  File "/home/anlambert/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 713, in __init__
    super().__init__(*args, **kwargs)
  File "/home/anlambert/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 675, in __init__
    super().__init__(*args, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'url'

It seems some loader parameters should be renamed or made explicit or the loading task will fail to execute otherwise.

This revision now requires changes to proceed.Oct 3 2022, 3:12 PM

ah! yeah, it's the first time we are scheduling ListedOrigins for non package loaders...
Thanks for raising this up!

Build is green

Patch application report for D8601 (id=31075)

Rebasing onto c631349aea...

Current branch diff-target is up to date.
Changes applied before test
commit a5255f1064533efc5f9032eb66ad7f12c2468f9b
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sun Oct 2 19:18:23 2022 +0200

    {Content|Directory}Loader: Register tasks
    
    Related to T3781

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/951/ for more details.

fwiw, this is working in docker ^.

Looks good to me, makes me think that I should deduplicate the tests for the tasks created from ListedOrigins in loaders through a fixture and add missing ones, I will push a diff for that.

This revision is now accepted and ready to land.Oct 4 2022, 10:53 AM