Hello,
The crates lister (stateless) and loader have landed.
I just solved some discovered issues while running lister and loader on the Docker env ( D8049 ).
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jun 29 2022
Jun 21 2022
I've scheduled the archival of the 7377 repos in one of the leftover one-shot queues.
Jun 19 2022
Jun 17 2022
Jun 16 2022
Jun 15 2022
Yesss! \o/
So in the end, the conclusion is that the loader already does the right thing so it's a noop, right?
Good news *can* happen, ahah! Thanks for notifying me.
To summarize, the initial intent was to adapt the jar loaded (as extracted directory) to append the pom.xml so we do not lose that reference.
You're right boris, indeed it's already stored as extrinsic metadata, we hadn't checked properly :)
Thank you for your answer !
I'm not sure to understand the intent, as we already keep the pom in the extrinsic metadata (don't we?).
Double-checking in the SWH codebase, I believe you could build upon this: see [1] lines 166-180.
Congrats on the work done! I think that downloading the pom file from the same folder is indeed the way to go.
I think the simplest way to get the pom file associated to a specific release of a maven package is to download it from the folder where we can find the source jar.
Jun 13 2022
- some maven origins contain a zip instead of a jar, and in that case it looks like the pom.xml is included (ex : https://webapp.staging.swh.network/browse/origin/directory/?origin_url=https://repo1.maven.org/maven2/org/jboss/snowdrop/snowdrop)
in a source.jar, the pom is not inculded by default but can be if specified :
Jun 3 2022
There remains git and other dvcs typed origins [1] listed by maven but not github ones [2].
status: triggered 2 full-maven lister runs on maven central and jboss [1]
And no more exotic github urls are popping up [2].
Yesterday, i had fixed, diffed, released and pushed the diff [1] fixing the
canonicalization of remaining exotic urls, cleaned up 'git' (out of a maven listing)
origins and triggered back a listing. Today, checking back those origins (staging
scheduler), there was still noise which should no longer have been there...
Jun 2 2022
Full listing is not finished yet but still there remains origins with exotic starting urls which are not canonicalized.
I'd say the issue lies with the canonicalize swh.core implementation code which only deals with https:// and git:// urls.
So some improvments are needed there.
Jun 1 2022
Plan:
- P1369: Listing status after first round listing
- Clean up maven github origins listing [1]
- Trigger maven full run [2]
- Wait for listing to finish
- Listing status after new maven lister round of listing
- Ping in mailing list discussion with data!
Old maven behavior results in origins like git://github.com, ... [1]
The new maven lister behavior should now result in canonical github urls http://github.com/user/repo.
Analysis ongoing and report will go after that comment.
May 25 2022
May 13 2022
May 12 2022
May 11 2022
May 4 2022
Sounds good.
In T2833#83806, @ghenry wrote:Hi @joenio
I've just joined SWH as an ambassador and wondered how you are getting on with the cpan.loader? Or maybe metacpan.loader now?
Thanks,
Gavin.
May 3 2022
May 2 2022
Apr 29 2022
We can now see new maven origins [1] with multiple lists of artifacts [2]
And ingested:
Apr 29 13:38:51 worker3 python3[215979]: [2022-04-29 13:38:51,458: INFO/ForkPoolWorker-1] Task swh.loader.package.maven.tasks.LoadMaven[4ac12ab2-7749-4152-baa1-6bf06a587cad] succeeded in 74.83338255400304s: {'status': 'eventful', 'snapshot_id': '571e40dbc59f5f5238d981986211bcabde2d73d8'}
New schemed maven origins are getting listed now:
Apr 28 2022
Apr 27 2022
Apr 22 2022
CPANTS These indicators might be helpful in categorizing incoming data. So that the mining process is significantly easier
I've just joined SWH as an ambassador and wondered how you are getting on with the cpan.loader? Or maybe metacpan.loader now?
Apr 15 2022
Apr 14 2022
And now we have some origins referenced in the (staging) archive [1]
Another round of deployment occured with swh.lister v2.8.1 occurred.
clojars repository got listed again (ongoing) and the lister is no longer crashing for that one.