Page MenuHomeSoftware Heritage

Implement maven jar source files loader
ClosedPublic

Authored by borisbaldassari on Oct 2 2021, 9:56 PM.

Details

Summary

The maven loader loads jar and zip files as Maven artefacts into the software heritage archive.

Note:
Supersedes D6158 and addresses the review done in that diff.

Related to T1724

Diff Detail

Repository
rDLDBASE Generic VCS/Package Loader
Branch
T1724_maven_central_jar_loader (branched from master)
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 25246
Build 39458: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 39457: arc lint + arc unit

Unit TestsFailed

TimeTest
187 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.loader.package.maven.tests.test_maven::test_metadatata
swh_storage = <swh.storage.proxies.retry.RetryingProxyStorage object at 0x7ff8eed54ef0> requests_mock = <requests_mock.mocker.Mocker object at 0x7ff8ee3485f8> data_jar_1 = b'PK\x03\x04\n\x00\x00\x08\x08\x00\xba\\\xb7L\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\t\x00\x00\x00META-INF/\x...*\x00\x00al/aldi/sprova4j/models/Execution.javaPK\x05\x06\x00\x00\x00\x00#\x00#\x00\xe2\n\x00\x00\xf4,\x00\x00\x00\x00'
180 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.loader.package.maven.tests.test_maven::test_metadatata_no_pom
swh_storage = <swh.storage.proxies.retry.RetryingProxyStorage object at 0x7ff8ee2f4fd0> requests_mock = <requests_mock.mocker.Mocker object at 0x7ff8ee434048> data_jar_1 = b'PK\x03\x04\n\x00\x00\x08\x08\x00\xba\\\xb7L\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\t\x00\x00\x00META-INF/\x...*\x00\x00al/aldi/sprova4j/models/Execution.javaPK\x05\x06\x00\x00\x00\x00#\x00#\x00\xe2\n\x00\x00\xf4,\x00\x00\x00\x00'
2 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.loader.core.tests.test_converters::test_content_for_storage_data
4 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.loader.core.tests.test_converters::test_content_for_storage_path
2 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.loader.core.tests.test_converters::test_content_for_storage_too_long
View Full Test Results (2 Failed · 173 Passed)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
This revision now requires changes to proceed.Nov 26 2021, 7:38 PM
borisbaldassari marked 4 inline comments as done.
  • maven-loader: set the pom as metadata
  • maven-loader: fix after final review (ardumont, vlorentz)

Build is green

Patch application report for D6396 (id=24339)

Rebasing onto 021880d030...

First, rewinding head to replay your work on top of it...
Applying: loader: add new maven-jar loader
Using index info to reconstruct a base tree...
M	docs/package-loader-specifications.rst
Falling back to patching base and 3-way merge...
Auto-merging docs/package-loader-specifications.rst
Applying: maven-loader: set the pom as metadata
Applying: maven-loader: fix after final review (ardumont, vlorentz)
Changes applied before test
commit 4d5e059e98901c3e24b4c36446bdf969f371ecbd
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Sun Nov 28 12:27:32 2021 +0100

    maven-loader: fix after final review (ardumont, vlorentz)

commit 08050bafb353f2567381a98e14cf82acf03db6f9
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Sat Nov 27 14:15:09 2021 +0100

    maven-loader: set the pom as metadata
    
    maven-loader: fix after final review (ardumont, vlorentz)

commit f98171102d2bf541d9d3963d0ac7b23017bb77bb
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Sun Aug 29 22:14:59 2021 +0200

    loader: add new maven-jar loader
    
    The maven loader loads jar and zip files as Maven artefacts into the software heritage archive.
    
    Note:
    Supersedes D6158 and addresses the review done in that diff.
    
    Related to T1724

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/626/ for more details.

Thanks for the last adaptations.

I have just one last question to discuss with valentin but i don't think that's blocking in the end.

swh/loader/package/maven/loader.py
80

@vlorentz Do we allow such side-effects [1] to occur this early in the package loading process?

I guess it's fine since it's called at the get_package_info time...
Just double checking with you.

[1] any form of interactions with the outside world (db, remote service, file, whatever can raise any form of issues...)

vlorentz added inline comments.
swh/loader/package/maven/loader.py
80

It's not too early; but this is outside any exception handling, which means a crash here (and they *will* happen since it's a network request) won't update the visit status, so it must be done somewhere else, eg. by specializing build_extrinsic_directory_metadata.

162–164

ping?

swh/loader/package/maven/tests/test_maven.py
318–342

thanks!

This revision now requires changes to proceed.Nov 29 2021, 10:36 AM
swh/loader/package/maven/loader.py
78–79

Can you pass this URL from the lister instead of re-computing it here? Or are we sure this URL will always be right?

84

are there any packages without a POM?

borisbaldassari marked 2 inline comments as done.
  • maven-loader: fix metadata for completeness

Build has FAILED

Patch application report for D6396 (id=24363)

Rebasing onto 021880d030...

First, rewinding head to replay your work on top of it...
Applying: loader: add new maven-jar loader
Using index info to reconstruct a base tree...
M	docs/package-loader-specifications.rst
Falling back to patching base and 3-way merge...
Auto-merging docs/package-loader-specifications.rst
Applying: maven-loader: set the pom as metadata
Applying: maven-loader: fix after final review (ardumont, vlorentz)
Applying: maven-loader: fix metadata for completeness
Changes applied before test
commit 9c6b24c34f8b295d4bb827fe8e708ec0aae43662
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Mon Nov 29 15:17:00 2021 +0100

    maven-loader: fix metadata for completeness

commit 0c2836237f89ebfda915b7c53388deac59a6e1e3
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Sun Nov 28 12:27:32 2021 +0100

    maven-loader: fix after final review (ardumont, vlorentz)

commit 9803dbb0ff470447ea52f8ff622215445d15c910
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Sat Nov 27 14:15:09 2021 +0100

    maven-loader: set the pom as metadata
    
    maven-loader: fix after final review (ardumont, vlorentz)

commit 7de48dffc1f7c04e6c18838278d4d471d86c9ca5
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Sun Aug 29 22:14:59 2021 +0200

    loader: add new maven-jar loader
    
    The maven loader loads jar and zip files as Maven artefacts into the software heritage archive.
    
    Note:
    Supersedes D6158 and addresses the review done in that diff.
    
    Related to T1724

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/627/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/627/console

Updated encode to fix double metadata objects.

Build has FAILED

Patch application report for D6396 (id=24365)

Rebasing onto 021880d030...

First, rewinding head to replay your work on top of it...
Applying: loader: add new maven-jar loader
Using index info to reconstruct a base tree...
M	docs/package-loader-specifications.rst
Falling back to patching base and 3-way merge...
Auto-merging docs/package-loader-specifications.rst
Applying: maven-loader: set the pom as metadata
Applying: maven-loader: fix after final review (ardumont, vlorentz)
Applying: maven-loader: fix metadata for completeness
Changes applied before test
commit 07583d1ba11f4ffbbe885841e2663b78efc6b694
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Mon Nov 29 15:17:00 2021 +0100

    maven-loader: fix metadata for completeness

commit 7931894264347cee1f4c7c7143dd6efc90f7fb6a
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Sun Nov 28 12:27:32 2021 +0100

    maven-loader: fix after final review (ardumont, vlorentz)

commit 32ce31242b76ef65635b6b9af47cdbe389a382c1
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Sat Nov 27 14:15:09 2021 +0100

    maven-loader: set the pom as metadata
    
    maven-loader: fix after final review (ardumont, vlorentz)

commit 66ac1d3f9f8a906d48920a827d702b9c6c45bf21
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Sun Aug 29 22:14:59 2021 +0200

    loader: add new maven-jar loader
    
    The maven loader loads jar and zip files as Maven artefacts into the software heritage archive.
    
    Note:
    Supersedes D6158 and addresses the review done in that diff.
    
    Related to T1724

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/628/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/628/console

  • maven-loader: fix metadata for pom and json

Build has FAILED

Patch application report for D6396 (id=24371)

Rebasing onto 021880d030...

First, rewinding head to replay your work on top of it...
Applying: loader: add new maven-jar loader
Using index info to reconstruct a base tree...
M	docs/package-loader-specifications.rst
Falling back to patching base and 3-way merge...
Auto-merging docs/package-loader-specifications.rst
Applying: maven-loader: set the pom as metadata
Applying: maven-loader: fix after final review (ardumont, vlorentz)
Applying: maven-loader: fix metadata for completeness
Applying: maven-loader: fix metadata for pom and json
Changes applied before test
commit a115b6abb4b73a212efb23484843e1955dedf5c5
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Mon Nov 29 16:04:00 2021 +0100

    maven-loader: fix metadata for pom and json

commit f52ce3c27f444bc96814d51d1ab49a86abc21d49
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Mon Nov 29 15:17:00 2021 +0100

    maven-loader: fix metadata for completeness

commit 4ec8cf818af490b422bb4c2fa2654d1ff78ebd15
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Sun Nov 28 12:27:32 2021 +0100

    maven-loader: fix after final review (ardumont, vlorentz)

commit d9d7b66afdfd571ba88bc20749d23f1158c742fe
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Sat Nov 27 14:15:09 2021 +0100

    maven-loader: set the pom as metadata
    
    maven-loader: fix after final review (ardumont, vlorentz)

commit bb87b06a85be0a1dddd68907a027311aba3935f9
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Sun Aug 29 22:14:59 2021 +0200

    loader: add new maven-jar loader
    
    The maven loader loads jar and zip files as Maven artefacts into the software heritage archive.
    
    Note:
    Supersedes D6158 and addresses the review done in that diff.
    
    Related to T1724

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/629/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/629/console

borisbaldassari marked 3 inline comments as done.
  • loader: add new maven-jar loader
  • maven-loader: set the pom as metadata
  • maven-loader: fix after final review (ardumont, vlorentz)
  • maven-loader: fix metadata for completeness
  • maven-loader: fix metadata for pom and json

Build has FAILED

Patch application report for D6396 (id=24372)

Rebasing onto 021880d030...

Current branch diff-target is up to date.
Changes applied before test
commit a9e0076483e6d44f3ab5600fc8c6f9caecf1d3c8
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Mon Nov 29 16:04:00 2021 +0100

    maven-loader: fix metadata for pom and json

commit 4f65c899bb66dc3457c603bfd8e32937fb93b7ba
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Mon Nov 29 15:17:00 2021 +0100

    maven-loader: fix metadata for completeness

commit 52fcafa36a3cf3834d3e8f3f278e8f40d128000e
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Sun Nov 28 12:27:32 2021 +0100

    maven-loader: fix after final review (ardumont, vlorentz)

commit 8ab8d11531953ca00cb6f58c7fd117ef5cdcd482
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Sat Nov 27 14:15:09 2021 +0100

    maven-loader: set the pom as metadata
    
    maven-loader: fix after final review (ardumont, vlorentz)

commit 6f6dc40d66c21f183522861aaae792de7332d2a5
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Sun Aug 29 22:14:59 2021 +0200

    loader: add new maven-jar loader
    
    The maven loader loads jar and zip files as Maven artefacts into the software heritage archive.
    
    Note:
    Supersedes D6158 and addresses the review done in that diff.
    
    Related to T1724

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/630/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/630/console

Many thanks ;)

I guess you "only" need to fix the build now ;)

swh/loader/package/maven/loader.py
97

*thumbs up*

  • maven-loader: move request.get to own function

Build has FAILED

Patch application report for D6396 (id=24379)

Rebasing onto 021880d030...

Current branch diff-target is up to date.
Changes applied before test
commit 0021e0cf7f8239a600e2cb0601d37c70bcb3af2e
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Wed Dec 1 12:56:31 2021 +0100

    maven-loader: move request.get to own function

commit a9e0076483e6d44f3ab5600fc8c6f9caecf1d3c8
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Mon Nov 29 16:04:00 2021 +0100

    maven-loader: fix metadata for pom and json

commit 4f65c899bb66dc3457c603bfd8e32937fb93b7ba
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Mon Nov 29 15:17:00 2021 +0100

    maven-loader: fix metadata for completeness

commit 52fcafa36a3cf3834d3e8f3f278e8f40d128000e
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Sun Nov 28 12:27:32 2021 +0100

    maven-loader: fix after final review (ardumont, vlorentz)

commit 8ab8d11531953ca00cb6f58c7fd117ef5cdcd482
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Sat Nov 27 14:15:09 2021 +0100

    maven-loader: set the pom as metadata
    
    maven-loader: fix after final review (ardumont, vlorentz)

commit 6f6dc40d66c21f183522861aaae792de7332d2a5
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Sun Aug 29 22:14:59 2021 +0200

    loader: add new maven-jar loader
    
    The maven loader loads jar and zip files as Maven artefacts into the software heritage archive.
    
    Note:
    Supersedes D6158 and addresses the review done in that diff.
    
    Related to T1724

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/631/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/631/console

borisbaldassari marked 3 inline comments as done.
  • loader: add new maven-jar loader
  • maven-loader: set the pom as metadata
  • maven-loader: fix after final review (ardumont, vlorentz)
  • maven-loader: fix metadata for completeness
  • maven-loader: fix metadata for pom and json
  • maven-loader: move request.get to own function
  • maven-loader: fix tests for unordered function call

Build has FAILED

Patch application report for D6396 (id=24380)

Rebasing onto 8a3df1376d...

Current branch diff-target is up to date.
Changes applied before test
commit 72514d26ae4de277015c284c307217e04f3619db
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Wed Dec 1 13:36:30 2021 +0100

    maven-loader: fix tests for unordered function call

commit 03ba42a29b5e1002311c629988f8bedf1aa66c05
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Wed Dec 1 12:56:31 2021 +0100

    maven-loader: move request.get to own function

commit 7dbc0c13162d77c1914c8667b4d871baf451586c
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Mon Nov 29 16:04:00 2021 +0100

    maven-loader: fix metadata for pom and json

commit 5f9980709be4051db9267086a062ee7925e67db8
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Mon Nov 29 15:17:00 2021 +0100

    maven-loader: fix metadata for completeness

commit 88d27d09654f080db5e316c4504d204931c353a4
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Sun Nov 28 12:27:32 2021 +0100

    maven-loader: fix after final review (ardumont, vlorentz)

commit 247959d4d602edb0005b606ba472fb5ce3f94041
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Sat Nov 27 14:15:09 2021 +0100

    maven-loader: set the pom as metadata
    
    maven-loader: fix after final review (ardumont, vlorentz)

commit 9a15f5064e29adfd8d3dd40169d9eeceb271d6e6
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Sun Aug 29 22:14:59 2021 +0200

    loader: add new maven-jar loader
    
    The maven loader loads jar and zip files as Maven artefacts into the software heritage archive.
    
    Note:
    Supersedes D6158 and addresses the review done in that diff.
    
    Related to T1724

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/632/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/632/console

  • maven-loader: fix tests for unordered function call

Build has FAILED

Patch application report for D6396 (id=24381)

Rebasing onto 8a3df1376d...

Current branch diff-target is up to date.
Changes applied before test
commit 7d878099945890d2f83d6cffdbc28204078918bb
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Wed Dec 1 13:41:20 2021 +0100

    maven-loader: fix tests for unordered function call

commit 72514d26ae4de277015c284c307217e04f3619db
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Wed Dec 1 13:36:30 2021 +0100

    maven-loader: fix tests for unordered function call

commit 03ba42a29b5e1002311c629988f8bedf1aa66c05
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Wed Dec 1 12:56:31 2021 +0100

    maven-loader: move request.get to own function

commit 7dbc0c13162d77c1914c8667b4d871baf451586c
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Mon Nov 29 16:04:00 2021 +0100

    maven-loader: fix metadata for pom and json

commit 5f9980709be4051db9267086a062ee7925e67db8
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Mon Nov 29 15:17:00 2021 +0100

    maven-loader: fix metadata for completeness

commit 88d27d09654f080db5e316c4504d204931c353a4
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Sun Nov 28 12:27:32 2021 +0100

    maven-loader: fix after final review (ardumont, vlorentz)

commit 247959d4d602edb0005b606ba472fb5ce3f94041
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Sat Nov 27 14:15:09 2021 +0100

    maven-loader: set the pom as metadata
    
    maven-loader: fix after final review (ardumont, vlorentz)

commit 9a15f5064e29adfd8d3dd40169d9eeceb271d6e6
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Sun Aug 29 22:14:59 2021 +0200

    loader: add new maven-jar loader
    
    The maven loader loads jar and zip files as Maven artefacts into the software heritage archive.
    
    Note:
    Supersedes D6158 and addresses the review done in that diff.
    
    Related to T1724

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/633/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/633/console

Fix unordered function return + rebase

Build is green

Patch application report for D6396 (id=24382)

Rebasing onto 8a3df1376d...

Current branch diff-target is up to date.
Changes applied before test
commit 813378833670c41fa777b5cf91a97de0fa1dab07
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Sun Aug 29 22:14:59 2021 +0200

    loader: add new maven-jar loader
    
    The maven loader loads jar and zip files as Maven artefacts into the software heritage archive.
    
    Note:
    Supersedes D6158 and addresses the review done in that diff.
    
    Related to T1724

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/634/ for more details.

This revision is now accepted and ready to land.Dec 1 2021, 2:03 PM

@borisbaldassari Can you please push your commit in master?

So that will land the diff ;)

@borisbaldassari Can you please push your commit in master?

I'd love to, but I get rejected by the commit hook as a non-developer.
I can't remember how I did for the lister, I think it went fine.. What am I doing wrong?

(swh) boris@debian:swh-loader-core$ git status
On branch master
Your branch is ahead of 'origin/master' by 1 commit.
  (use "git push" to publish your local commits)

nothing to commit, working tree clean
(swh) boris@debian:swh-loader-core$ git push origin master
Enumerating objects: 30, done.
Counting objects: 100% (30/30), done.
Delta compression using up to 3 threads
Compressing objects: 100% (21/21), done.
Writing objects: 100% (22/22), 22.43 KiB | 3.74 MiB/s, done.
Total 22 (delta 10), reused 0 (delta 0)
remote: *** PUSH REJECTED BY COMMIT HOOK ***
remote: 
remote: This push was rejected by Herald push rule H16.
remote:     Change: commit/
remote:       Rule: Non-developers require code review for pushing content
remote:     Reason: needs review
remote: Transcript: https://forge.softwareheritage.org/herald/transcript/348895/
remote: 
To forge.softwareheritage.org:/source/swh-loader-core.git
 ! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'git@forge.softwareheritage.org:/source/swh-loader-core.git'
docs/package-loader-specifications.rst
62

Changed both to "Synthetic release for maven-loader at {p_info.url}\n".

62

Ok, no problem. Will do that Thursday (I'm busy until then).
Sorry for misinterpreting your comment.

64

Yes, I understand that specifications are supposed to be in sync with the code. I had changed the specs to use swh_person in order to be aligned with the maven loader and with other loaders (e.g. deposit) to ensure consistency. Don't understand what happened.

Anyway, I've set the empty person in the loader, reverted my last change in the spec, and now everything is empty_author.

swh/loader/package/maven/loader.py
40

Yes, now it's a string. Updated as suggested to use datetime.

45–46

I think I can make it an int.
Will try and update the doc as requested on Thursday.

55–56

As per the Lucene docs, it is the last update time of the file. Fixed.

57

Interesting. It was initially copied from deposit (which i used as a template to start with). Removed it.

78–79

No, we don't have it at the lister level either. And we are not sure the URL exists.

This is what I had written in another comment -- but I have the feeling that some of my answers are lost, which does not help.

We have no reliable way to know the POM; this is a heuristic that should work most of the time.
That was my point about having the reduced set of json info (gid, aid, version), which is always available, compared to the more rich but less reliable pom content.
(see comment on line 84).

83

neat!

84

As stated above, yes. This is what I explained in another, now invisible answer. Let me look for it.

Ah, actually it has simply moved, it's right below:
https://forge.softwareheritage.org/D6396?vs=on&id=24339#anchor-inline-47926

So we can choose between having the XML pom (complete, but not reliable) or having the JSON (incomplete, but always provided).
So it's really up to you.

102

We don't have the original POM file around when loading the jar, and I don't see a reliable way to get it.

We might use the (gId, aId, version) coordinates to try to download a pom and set it as metadata, but I'm not sure how often it would work.
It depends if you'd prefer consistency and reliability (always have the above information) or completeness (not always the same information/format, but more complete when it's there).

What do you prefer?

I've set the a_metadata full info instead of the extract for now.

116

See next comment, yes.

We introduced last_update as a datetime, but the time parameter passed to the loader is an int. And it stil has the same meaning, i.e. last modification time on the server (as defined in the lucene index at least).

Do you think it's not?

121

Yes, it still is a timestamp. We changed the type to int, but the content is basically the same.

149–150

Hum. Interesting, thanks for pointing that out.

I thought it was somehow used by the loading mechanism to register its state, but on a second look it's probably just a carryover from the example I took in the archive loader. Removed.

162–164

No, the lister is expected to always provide the version (alongside the gid and aid). If you're wondering why having this loop, it's because it's more elegant than artefact[0] and it feels more safe. But I can change it.

(Note that there is only one artefact in the artifact list, as identified by the url.)

162–164

I don't know. I'm not sure to understand what you mean or expect.

What I know:

  • This loader is executed for a single jar/zip, or in other words for a single artefact, defined by the URL. There is no case where there would be more than one artefact involved.
  • The gid, aid, version are always provided by the lister. Without this information, the jar loader is nothing more than an archive loader.

To answer your initial comment, when the loader is invoked there is indeed at most one package (whatever the gid, aid, version).

Is there a specific case/situation you have in mind?

169

Moved that next to the parsing step.

I don't think it should be put in the lister, as what is yielded by the lister is already a UTC time. This line simply sets the loader local var as a UTC time to be interpreted correctly.

176

Hum. No, I don't think so. The last_update parameter has been added as a datetime for the scheduler, but the time parameter passed to the loader is still an int.

Am I missing something?

206

I've updated the documentation to use "SWH robot", for consistency with the deposit and archive loaders (which I've used intensively as examples).

206

As mentioned I've updated the package-loader-specifications.rst file to be consistent with other loaders. It was a mistake from me to create the maven entry with an empty person at first.
Is there anything else I should do?

swh/loader/package/maven/tests/test_maven.py
236–244

Good point, thanks.

282–287

Because requests_mock_datadir.request_history is a requests_mock.request._RequestObjectProxy object that cannot be compared to a list.

It's ugly, though, so I've rewritten these lines for readability. Thanks.

282–287

This has been addressed in the lines below:

assert requests_mock_datadir.request_history[0].url == MVN_ARTIFACTS[1]["url"]
assert len(requests_mock_datadir.request_history) == 1

I don't know what I can improve there. As said above the equality just isn't possible as I cannot compare requests_mock.request._RequestObjectProxy to a list. But the second line ensures that equality is satisfied (and that there is no other item). Any idea?

318–342

@vlorentz you're not annoying.
What is annoying though is the fact that some of my answers are lost (when rebasing I guess). :-)

Anyway, I've done as suggested.

You need to update your diff after rebasing

This revision was landed with ongoing or failed builds.Dec 6 2021, 2:46 PM
This revision was automatically updated to reflect the committed changes.

You need to update your diff after rebasing

I thought it had been done already, but seems not. Thanks @vlorentz !!

Build is green

Patch application report for D6396 (id=24512)

Rebasing onto 5d22455c94...

Current branch diff-target is up to date.
Changes applied before test
commit 89f5ccc7f5fc030e6db514f79726788770c6fa50
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Sun Aug 29 22:14:59 2021 +0200

    loader: add new maven-jar loader
    
    The maven loader loads jar and zip files as Maven artefacts into the software heritage archive.
    
    Note:
    Supersedes D6158 and addresses the review done in that diff.
    
    Related to T1724

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/646/ for more details.