Page MenuHomeSoftware Heritage

maven: Use the instance base_url as metadata authority URL
ClosedPublic

Authored by vlorentz on Dec 7 2021, 2:01 PM.

Details

Summary

instead of just its netloc, as it is possibly to have multiple maven instances
hosted under the same domain but at different paths.

The code is also simpler this way.

@borisbaldassari Sounds good?

Diff Detail

Event Timeline

Build has FAILED

Patch application report for D6771 (id=24566)

Could not rebase; Attempt merge onto 79b1075e1d...

Updating 79b1075..62f66eb
Fast-forward
 requirements.txt                             |   1 +
 swh/loader/package/maven/loader.py           | 135 ++++++++++++---------------
 swh/loader/package/maven/tests/test_maven.py |   8 +-
 3 files changed, 65 insertions(+), 79 deletions(-)
Changes applied before test
commit 62f66eb83fbeb4e6aea2db6fc178dbf616b2f1e3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Dec 7 14:01:06 2021 +0100

    maven: Use the instance base_url as metadata authority URL
    
    instead of just its netloc, as it is possibly to have multiple maven instances
    hosted under the same domain but at different paths.
    
    The code is also simpler this way.

commit a96389f5b916b307141aaceac6f8a49e43ca389b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Dec 7 13:43:08 2021 +0100

    maven: Don't carry deleted versions over to the next snapshot
    
    Snapshots should only record versions that currently exist;
    even if they used to exist in a previous visits.
    
    If readers of the archive want to access deleted versions,
    than can look up older snapshots.

commit e8b6ed5ab223de3839a3c02d771364207d47160a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Dec 7 13:37:17 2021 +0100

    maven: Make MavenPackageInfo.from_metadata more concise

commit 5da115b6e5bf48e3829c830c9f164bd07ed14509
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Dec 7 13:34:29 2021 +0100

    maven: Simplify definition of the 'version_artifact' dict
    
    We don't need it to be ordered; and '.keys()' is redundant.

commit ccf71383c61d642f717256a5ed55539073fd0477
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Dec 7 13:32:26 2021 +0100

    maven: Simplify build_extrinsic_directory_metadata.

commit a76ab28824a2c203b8cc5f9ff70cecf922770662
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Dec 7 11:54:33 2021 +0100

    maven: Add typing to the artifacts dict

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/660/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/660/console

Harbormaster returned this revision to the author for changes because remote builds failed.Dec 7 2021, 2:03 PM
Harbormaster failed remote builds in B25438: Diff 24566!

fix crash on empty list of artifacts

Build has FAILED

Patch application report for D6771 (id=24567)

Could not rebase; Attempt merge onto 79b1075e1d...

Updating 79b1075..f755b26
Fast-forward
 requirements.txt                             |   1 +
 swh/loader/package/maven/loader.py           | 141 ++++++++++++---------------
 swh/loader/package/maven/tests/test_maven.py |   8 +-
 3 files changed, 71 insertions(+), 79 deletions(-)
Changes applied before test
commit f755b2681a6c5b0cd76f8f46c3ca553279e940f3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Dec 7 14:01:06 2021 +0100

    maven: Use the instance base_url as metadata authority URL
    
    instead of just its netloc, as it is possibly to have multiple maven instances
    hosted under the same domain but at different paths.
    
    The code is also simpler this way.

commit a96389f5b916b307141aaceac6f8a49e43ca389b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Dec 7 13:43:08 2021 +0100

    maven: Don't carry deleted versions over to the next snapshot
    
    Snapshots should only record versions that currently exist;
    even if they used to exist in a previous visits.
    
    If readers of the archive want to access deleted versions,
    than can look up older snapshots.

commit e8b6ed5ab223de3839a3c02d771364207d47160a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Dec 7 13:37:17 2021 +0100

    maven: Make MavenPackageInfo.from_metadata more concise

commit 5da115b6e5bf48e3829c830c9f164bd07ed14509
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Dec 7 13:34:29 2021 +0100

    maven: Simplify definition of the 'version_artifact' dict
    
    We don't need it to be ordered; and '.keys()' is redundant.

commit ccf71383c61d642f717256a5ed55539073fd0477
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Dec 7 13:32:26 2021 +0100

    maven: Simplify build_extrinsic_directory_metadata.

commit a76ab28824a2c203b8cc5f9ff70cecf922770662
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Dec 7 11:54:33 2021 +0100

    maven: Add typing to the artifacts dict

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/661/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/661/console

Build has FAILED

Patch application report for D6771 (id=24568)

Could not rebase; Attempt merge onto 79b1075e1d...

Updating 79b1075..ed1cf94
Fast-forward
 requirements.txt                             |   1 +
 swh/loader/package/maven/loader.py           | 141 ++++++++++++---------------
 swh/loader/package/maven/tests/test_maven.py |   8 +-
 3 files changed, 71 insertions(+), 79 deletions(-)
Changes applied before test
commit ed1cf9471bf600bfdfcd19192e9b7f93453a1f75
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Dec 7 14:01:06 2021 +0100

    maven: Use the instance base_url as metadata authority URL
    
    instead of just its netloc, as it is possibly to have multiple maven instances
    hosted under the same domain but at different paths.
    
    The code is also simpler this way.

commit a96389f5b916b307141aaceac6f8a49e43ca389b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Dec 7 13:43:08 2021 +0100

    maven: Don't carry deleted versions over to the next snapshot
    
    Snapshots should only record versions that currently exist;
    even if they used to exist in a previous visits.
    
    If readers of the archive want to access deleted versions,
    than can look up older snapshots.

commit e8b6ed5ab223de3839a3c02d771364207d47160a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Dec 7 13:37:17 2021 +0100

    maven: Make MavenPackageInfo.from_metadata more concise

commit 5da115b6e5bf48e3829c830c9f164bd07ed14509
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Dec 7 13:34:29 2021 +0100

    maven: Simplify definition of the 'version_artifact' dict
    
    We don't need it to be ordered; and '.keys()' is redundant.

commit ccf71383c61d642f717256a5ed55539073fd0477
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Dec 7 13:32:26 2021 +0100

    maven: Simplify build_extrinsic_directory_metadata.

commit a76ab28824a2c203b8cc5f9ff70cecf922770662
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Dec 7 11:54:33 2021 +0100

    maven: Add typing to the artifacts dict

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/662/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/662/console

Harbormaster returned this revision to the author for changes because remote builds failed.Dec 7 2021, 2:18 PM
Harbormaster failed remote builds in B25440: Diff 24568!

Build is green

Patch application report for D6771 (id=24569)

Could not rebase; Attempt merge onto 79b1075e1d...

Updating 79b1075..972cdc6
Fast-forward
 requirements.txt                             |   1 +
 swh/loader/package/maven/loader.py           | 141 ++++++++++++---------------
 swh/loader/package/maven/tests/test_maven.py |   8 +-
 swh/loader/package/maven/tests/test_tasks.py |   5 +-
 4 files changed, 74 insertions(+), 81 deletions(-)
Changes applied before test
commit 972cdc61fcd8127d827b41fb692b05f6788aac00
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Dec 7 14:01:06 2021 +0100

    maven: Use the instance base_url as metadata authority URL
    
    instead of just its netloc, as it is possibly to have multiple maven instances
    hosted under the same domain but at different paths.
    
    The code is also simpler this way.

commit a96389f5b916b307141aaceac6f8a49e43ca389b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Dec 7 13:43:08 2021 +0100

    maven: Don't carry deleted versions over to the next snapshot
    
    Snapshots should only record versions that currently exist;
    even if they used to exist in a previous visits.
    
    If readers of the archive want to access deleted versions,
    than can look up older snapshots.

commit e8b6ed5ab223de3839a3c02d771364207d47160a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Dec 7 13:37:17 2021 +0100

    maven: Make MavenPackageInfo.from_metadata more concise

commit 5da115b6e5bf48e3829c830c9f164bd07ed14509
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Dec 7 13:34:29 2021 +0100

    maven: Simplify definition of the 'version_artifact' dict
    
    We don't need it to be ordered; and '.keys()' is redundant.

commit ccf71383c61d642f717256a5ed55539073fd0477
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Dec 7 13:32:26 2021 +0100

    maven: Simplify build_extrinsic_directory_metadata.

commit a76ab28824a2c203b8cc5f9ff70cecf922770662
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Dec 7 11:54:33 2021 +0100

    maven: Add typing to the artifacts dict

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/663/ for more details.

As said in the lister's corresponding diff, makes sense to me. Thanks for the improvement.

swh/loader/package/maven/loader.py
140–141

ValueError would be thrown only in case of an empty list, right? So why the "more than one maven instance"?

swh/loader/package/maven/tests/test_tasks.py
15

Not sure if the base_url should look like:

According to the lister it should be the latter I think..

swh/loader/package/maven/loader.py
140–141

It is raised if base_urls does not have exactly one item. But here, the code would not run at all if it is empty, because of the if; so we can be sure it is only raised if there is more than one.

swh/loader/package/maven/tests/test_tasks.py
15

oh you're right, it should be https://repo1.maven.org/maven2/

Build has FAILED

Patch application report for D6771 (id=24596)

Could not rebase; Attempt merge onto e8b6ed5ab2...

Updating e8b6ed5..394fd99
Fast-forward
 swh/loader/package/maven/loader.py           | 35 +++++++++++++++++-----------
 swh/loader/package/maven/tests/test_maven.py |  8 +++++--
 swh/loader/package/maven/tests/test_tasks.py |  5 ++--
 3 files changed, 31 insertions(+), 17 deletions(-)
Changes applied before test
commit 394fd998357bb6c1fbc3abb4ab348ebc811149e4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Dec 7 14:01:06 2021 +0100

    maven: Use the instance base_url as metadata authority URL
    
    instead of just its netloc, as it is possibly to have multiple maven instances
    hosted under the same domain but at different paths.
    
    The code is also simpler this way.

commit a96389f5b916b307141aaceac6f8a49e43ca389b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Dec 7 13:43:08 2021 +0100

    maven: Don't carry deleted versions over to the next snapshot
    
    Snapshots should only record versions that currently exist;
    even if they used to exist in a previous visits.
    
    If readers of the archive want to access deleted versions,
    than can look up older snapshots.

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/664/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/664/console

ardumont added a subscriber: ardumont.

lgtm

Remains to fix the build ;)

This revision is now accepted and ready to land.Dec 8 2021, 11:11 AM

Build is green

Patch application report for D6771 (id=24596)

Rebasing onto a96389f5b9...

Current branch diff-target is up to date.
Changes applied before test
commit 394fd998357bb6c1fbc3abb4ab348ebc811149e4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Dec 7 14:01:06 2021 +0100

    maven: Use the instance base_url as metadata authority URL
    
    instead of just its netloc, as it is possibly to have multiple maven instances
    hosted under the same domain but at different paths.
    
    The code is also simpler this way.

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/665/ for more details.

swh/loader/package/maven/loader.py
139

i'm gonna pull a "val", you are missing a branch coverage here.

Build is green

Patch application report for D6771 (id=24614)

Rebasing onto a96389f5b9...

Current branch diff-target is up to date.
Changes applied before test
commit 98506af074d00cf735c9563fbfb0f5b20cef768a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Dec 7 14:01:06 2021 +0100

    maven: Use the instance base_url as metadata authority URL
    
    instead of just its netloc, as it is possibly to have multiple maven instances
    hosted under the same domain but at different paths.
    
    The code is also simpler this way.

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/666/ for more details.