Page MenuHomeSoftware Heritage

Archive the pom file additionally to the source folder
Closed, MigratedEdits Locked

Description

Some archived maven origins do not contain the pom.xml file.

In these cases, the pom.xml should be archived as extrinsic metadata.

example : https://webapp.staging.swh.network/browse/origin/directory/?origin_url=https://repo1.maven.org/maven2/io/github/aliyun-beta/aliyun-slb

Event Timeline

bchauvet triaged this task as Normal priority.Jun 13 2022, 12:10 PM
bchauvet created this task.
bchauvet updated the task description. (Show Details)
This comment was removed by bchauvet.

I think the simplest way to get the pom file associated to a specific release of a maven package is to download it from the folder where we can find the source jar.

For instance with that maven origin, the source jar the 0.1.2 release is downloaded from that folder.

The associated pom file can also be found in that folder.

All maven packages seem to follow that convention about the files that can be downloaded.

Hi @anlambert, @bchauvet ,

Congrats on the work done! I think that downloading the pom file from the same folder is indeed the way to go.

AFAICR we had a discussion about this mapping (poms <-> jars) and I vaguely remember that it works in one way (i.e. if there is a jar, then there is a pom) but not the other way around (i.e. there are many poms without any jars). As a matter of fact, I've spent the last 30 minutes randomly crawling through various maven repositories and I could not find a single example of a missing pom when there is a jar (or zip..).

So I assume it would work.

I'm not sure to understand the intent, as we already keep the pom in the extrinsic metadata (don't we?).
Double-checking in the SWH codebase, I believe you could build upon this: see [1] lines 166-180.

[1] https://forge.softwareheritage.org/source/swh-loader-core/browse/master/swh/loader/package/maven/loader.py;d925d06e6f1a51a4a7e8f0d1250a1c3bc45db891$166?as=source&blame=off

HTH, cheers!

You're right boris, indeed it's already stored as extrinsic metadata, we hadn't checked properly :)
Thank you for your answer !

@douardda @bchauvet @anlambert ^

To summarize, the initial intent was to adapt the jar loaded (as extracted directory) to append the pom.xml so we do not lose that reference.

But then there exists jar artifacts that do append the pom.xml.
It all depends on how the developers do declare in their pom how to build the "*sources*.jar".
So then the plan became "reference the pom as extrinsic metadata".
But now, as @borisbaldassari mentions, it's already the case.

So in the end, the conclusion is that the loader already does the right thing so it's a noop, right?

Good news *can* happen, ahah! Thanks for notifying me.

Have a great day, cheers!

ardumont renamed this task from archive the pom file additionnaly to the source folder to archive the pom file additionally to the source folder.Jun 15 2022, 4:09 PM
ardumont renamed this task from archive the pom file additionally to the source folder to Archive the pom file additionally to the source folder.

So in the end, the conclusion is that the loader already does the right thing so it's a noop, right?

Yes, we can close this task as invalid and deploy the maven loader to production \o/

bchauvet claimed this task.