stuff related to extend the coverage of the Software Heritage archive
This has now been discussed on the sourcehut mailing list and I took part in the conversation.
Fri, Aug 5
We discussed internally what to do with inactive repositories.
We reached a decision to move unused repos to object storage.
Once implemented, they will still be accessible but take a bit longer to access after a long period of inactivity.
Thu, Aug 4
Looks like there's many more repos that should be visitable but aren't:
worth opening a dedicated forge issue
open an upstream issue
updated query running:
As usual, I'm uneasy with the (general) idea of manually handling some repositories to resorb one bit of lag. This will only increase lag in another area that we will want to cover next. Rinse, repeat.
I am currently running a query to find how many origins are over one year overdue for a visit:
Thu, Jul 28
Tue, Jul 19
Wed, Jul 13
Jun 29 2022
Will work on the incremental lister, and then document (not already done).
What the next steps here?
The crates lister (stateless) and loader have landed.
I just solved some discovered issues while running lister and loader on the Docker env ( D8049 ).
Jun 21 2022
I've scheduled the archival of the 7377 repos in one of the leftover one-shot queues.
Jun 19 2022
Jun 17 2022
Jun 16 2022
Jun 15 2022
So in the end, the conclusion is that the loader already does the right thing so it's a noop, right?
Good news *can* happen, ahah! Thanks for notifying me.
To summarize, the initial intent was to adapt the jar loaded (as extracted directory) to append the pom.xml so we do not lose that reference.
You're right boris, indeed it's already stored as extrinsic metadata, we hadn't checked properly :)
Thank you for your answer !
I'm not sure to understand the intent, as we already keep the pom in the extrinsic metadata (don't we?).
Double-checking in the SWH codebase, I believe you could build upon this: see  lines 166-180.
Congrats on the work done! I think that downloading the pom file from the same folder is indeed the way to go.
I think the simplest way to get the pom file associated to a specific release of a maven package is to download it from the folder where we can find the source jar.
Jun 13 2022
- some maven origins contain a zip instead of a jar, and in that case it looks like the pom.xml is included (ex : https://webapp.staging.swh.network/browse/origin/directory/?origin_url=https://repo1.maven.org/maven2/org/jboss/snowdrop/snowdrop)