Page MenuHomeSoftware Heritage

package/maven: Fix jar archive download after changes in lister
ClosedPublic

Authored by anlambert on Apr 28 2022, 6:23 PM.

Details

Summary

Maven lister has been updated to create one origin per package and
group all its versions in it as releases.

Previously a maven origin URL corresponded to a jar archive URL
while now it corresponds to the page where all versions of a
package can be found.

So update loader implementation to reflect this change and fix
jar archives download.

Also update tests after this change and refactor them a bit,
notably by using fixtures instead of hardcoded tests data.

Related to D7710

Related to T3874

Diff Detail

Repository
rDLDBASE Generic VCS/Package Loader
Branch
maven-loader-url-fix
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 28965
Build 45280: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 45279: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D7712 (id=27893)

Rebasing onto 2e27f7c7a6...

Current branch diff-target is up to date.
Changes applied before test
commit 51a724748a16df6b0971256b4c36e889a4f2684a
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Thu Apr 28 18:17:05 2022 +0200

    package/maven: Fix jar archive download after changes in lister
    
    Maven lister has been updated to create one origin per package and
    group all its versions in it as releases.
    
    Previously a maven origin URL corresponded to a jar archive URL
    while now it corresponds to the page where all versions of a
    package can be found.
    
    So update loader implementation to reflect this change and fix
    jar archives download.
    
    Also update tests after this change and refactor them a bit,
    notably by using fixtures instead of hardcoded tests data.
    
    Related to T3874

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/754/ for more details.

ardumont added a subscriber: ardumont.

Thanks.

lgtm, some suggestions inline.

swh/loader/package/maven/tests/test_maven.py
133–155

maybe?

175

heh, what i suggest you do for release (with directory) you did here that for snapshot (with release) ;)

289–290

with something like this declared earlier:

_, expected_directories = expected_contents_and_directories
300
This revision is now accepted and ready to land.Apr 29 2022, 10:37 AM
swh/loader/package/maven/tests/test_maven.py
133–155

More de-harcoding, good idea ! I managed to do it with a new level of fixture and I also forgot to use the expected_releases fixture in another test, see below.

diff --git a/swh/loader/package/maven/tests/test_maven.py b/swh/loader/package/maven/tests/test_maven.py
index 7abd37a..36de2a7 100644
--- a/swh/loader/package/maven/tests/test_maven.py
+++ b/swh/loader/package/maven/tests/test_maven.py
@@ -108,7 +108,7 @@ def data_pom_2(datadir):
 
 
 @pytest.fixture
-def expected_contents_and_directories(datadir, tmp_path):
+def jar_dirs(datadir, tmp_path):
     jar_1_path = os.path.join(datadir, "https_maven.org", "sprova4j-0.1.0-sources.jar")
     jar_2_path = os.path.join(datadir, "https_maven.org", "sprova4j-0.1.1-sources.jar")
 
@@ -121,8 +121,13 @@ def expected_contents_and_directories(datadir, tmp_path):
     jar_1_dir = Directory.from_disk(path=jar_1_extract_path.encode())
     jar_2_dir = Directory.from_disk(path=jar_2_extract_path.encode())
 
-    jar_1_cnts, _, jar_1_dirs = iter_directory(jar_1_dir)
-    jar_2_cnts, _, jar_2_dirs = iter_directory(jar_2_dir)
+    return [jar_1_dir, jar_2_dir]
+
+
+@pytest.fixture
+def expected_contents_and_directories(jar_dirs):
+    jar_1_cnts, _, jar_1_dirs = iter_directory(jar_dirs[0])
+    jar_2_cnts, _, jar_2_dirs = iter_directory(jar_dirs[1])
 
     contents = {cnt.sha1 for cnt in chain(jar_1_cnts, jar_2_cnts)}
     directories = {dir.id for dir in chain(jar_1_dirs, jar_2_dirs)}
@@ -131,7 +136,7 @@ def expected_contents_and_directories(datadir, tmp_path):
 
 
 @pytest.fixture
-def expected_releases():
+def expected_releases(jar_dirs):
     return [
         Release(
             name=b"0.1.0",
@@ -139,7 +144,7 @@ def expected_releases():
             author=EMPTY_AUTHOR,
             date=REL_DATES[0],
             target_type=ModelObjectType.DIRECTORY,
-            target=hash_to_bytes("6c9de41e4cebb91a8368da1d89ae9873bd540ec3"),
+            target=jar_dirs[0].hash,
             synthetic=True,
             metadata=None,
         ),
@@ -149,7 +154,7 @@ def expected_releases():
             author=EMPTY_AUTHOR,
             date=REL_DATES[1],
             target_type=ModelObjectType.DIRECTORY,
-            target=hash_to_bytes("7951ea286383baebda4d86fa60c5d7129db8f1ad"),
+            target=jar_dirs[1].hash,
             synthetic=True,
             metadata=None,
         ),
@@ -258,7 +263,7 @@ def test_maven_loader_jar_visit_inconsistent_base_url(
 
 
 def test_maven_loader_first_visit(
-    swh_storage, expected_contents_and_directories, expected_snapshot
+    swh_storage, expected_contents_and_directories, expected_snapshot, expected_releases
 ):
     """With no prior visit, loading a jar ends up with 1 snapshot"""
 
@@ -283,29 +288,9 @@ def test_maven_loader_first_visit(
 
     rel_id = actual_snapshot.branches[b"releases/0.1.0"].target
     rel2_id = actual_snapshot.branches[b"releases/0.1.1"].target
-    (rel, rel2) = swh_storage.release_get([rel_id, rel2_id])
-
-    assert rel == Release(
-        name=b"0.1.0",
-        message=REL_MSGS[0],
-        author=EMPTY_AUTHOR,
-        date=REL_DATES[0],
-        target_type=ModelObjectType.DIRECTORY,
-        target=hash_to_bytes("6c9de41e4cebb91a8368da1d89ae9873bd540ec3"),
-        synthetic=True,
-        metadata=None,
-    )
+    releases = swh_storage.release_get([rel_id, rel2_id])
 
-    assert rel2 == Release(
-        name=b"0.1.1",
-        message=REL_MSGS[1],
-        author=EMPTY_AUTHOR,
-        date=REL_DATES[1],
-        target_type=ModelObjectType.DIRECTORY,
-        target=hash_to_bytes("7951ea286383baebda4d86fa60c5d7129db8f1ad"),
-        synthetic=True,
-        metadata=None,
-    )
+    assert releases == expected_releases

Build is green

Patch application report for D7712 (id=27900)

Rebasing onto 2e27f7c7a6...

Current branch diff-target is up to date.
Changes applied before test
commit e449786aa0983b125bee2ba57e7f79de9a81cf0a
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Thu Apr 28 18:17:05 2022 +0200

    package/maven: Fix jar archive download after changes in lister
    
    Maven lister has been updated to create one origin per package and
    group all its versions in it as releases.
    
    Previously a maven origin URL corresponded to a jar archive URL
    while now it corresponds to the page where all versions of a
    package can be found.
    
    So update loader implementation to reflect this change and fix
    jar archives download.
    
    Also update tests after this change and refactor them a bit,
    notably by using fixtures instead of hardcoded tests data.
    
    Related to T3874

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/759/ for more details.

*thumbs up*, even better ;)