- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jan 8 2023
Nov 3 2022
Oct 19 2022
Sep 8 2022
\o/ great
Checks:
- task has been scheduled by the scheduler runner process [1]
- listing is being consumed by one worker [2]
- 'maven' listed origins is steadily growing [3]
- New 'maven' listed origins are getting scheduled for ingestion [4]
- maven loaders are ingesting those [5]
Schedule maven-central listing:
swhscheduler@saatchi:~$ curl -s https://repo1.maven.org/maven2/ | head -2 <!DOCTYPE html> <html> swhscheduler@saatchi:~$ curl -s https://maven-exporter.internal.softwareheritage.org/export-maven-central.fld | head -2 doc 0 field 0 swhscheduler@saatchi:~$ curl -s http://saatchi.internal.softwareheritage.org:5008/ <html> <head><title>Software Heritage scheduler RPC server</title></head> <body> <p>You have reached the <a href="https://www.softwareheritage.org/">Software Heritage</a> scheduler RPC server.<br /> See its <a href="https://docs.softwareheritage.org/devel/swh-scheduler/">documentation and API</a> for more information</p> </body> </html>swhscheduler@saatchi:~$ swh scheduler --url http://saatchi.internal.softwareheritage.org:5008/ \ > task add list-maven-full \ > url=https://repo1.maven.org/maven2/ \ > index_url=https://maven-exporter.internal.softwareheritage.org/export-maven-central.fld Created 1 tasks
Finally, export is done on maven central [1], the fld is computed [2]...
And it's also exposed, hence reachable from lister worker nodes.
Sep 7 2022
Sep 6 2022
Jun 16 2022
Jun 15 2022
Yesss! \o/
So in the end, the conclusion is that the loader already does the right thing so it's a noop, right?
Good news *can* happen, ahah! Thanks for notifying me.
To summarize, the initial intent was to adapt the jar loaded (as extracted directory) to append the pom.xml so we do not lose that reference.
You're right boris, indeed it's already stored as extrinsic metadata, we hadn't checked properly :)
Thank you for your answer !
I'm not sure to understand the intent, as we already keep the pom in the extrinsic metadata (don't we?).
Double-checking in the SWH codebase, I believe you could build upon this: see [1] lines 166-180.
Congrats on the work done! I think that downloading the pom file from the same folder is indeed the way to go.
I think the simplest way to get the pom file associated to a specific release of a maven package is to download it from the folder where we can find the source jar.
Jun 13 2022
- some maven origins contain a zip instead of a jar, and in that case it looks like the pom.xml is included (ex : https://webapp.staging.swh.network/browse/origin/directory/?origin_url=https://repo1.maven.org/maven2/org/jboss/snowdrop/snowdrop)
in a source.jar, the pom is not inculded by default but can be if specified :
Jun 9 2022
This is better as we will not have to install any new runtime dependencies in workers.
Alternatively, we could use the zipfile standard Python module which seems to work in a similar way as the jar command, see below:
(swh) anlambert@carnavalet:/tmp/jar_test$ wget https://repo1.maven.org/maven2/org/pustefixframework/pustefix-archetype-basic/0.15.20/pustefix-archetype-basic-0.15.20-sources.jar --2022-06-09 14:56:17-- https://repo1.maven.org/maven2/org/pustefixframework/pustefix-archetype-basic/0.15.20/pustefix-archetype-basic-0.15.20-sources.jar Resolving repo1.maven.org (repo1.maven.org)... 151.101.120.209 Connecting to repo1.maven.org (repo1.maven.org)|151.101.120.209|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 45637 (45K) [application/java-archive] Saving to: ‘pustefix-archetype-basic-0.15.20-sources.jar’
fwiw, this makes sense ;)
Sentry issue: SWH-LOADER-CORE-150
Sentry issue: SWH-LOADER-CORE-ZW
Jun 3 2022
There remains git and other dvcs typed origins [1] listed by maven but not github ones [2].
status: triggered 2 full-maven lister runs on maven central and jboss [1]
And no more exotic github urls are popping up [2].
Yesterday, i had fixed, diffed, released and pushed the diff [1] fixing the
canonicalization of remaining exotic urls, cleaned up 'git' (out of a maven listing)
origins and triggered back a listing. Today, checking back those origins (staging
scheduler), there was still noise which should no longer have been there...
Jun 2 2022
Full listing is not finished yet but still there remains origins with exotic starting urls which are not canonicalized.
I'd say the issue lies with the canonicalize swh.core implementation code which only deals with https:// and git:// urls.
So some improvments are needed there.
Jun 1 2022
Plan:
- P1369: Listing status after first round listing
- Clean up maven github origins listing [1]
- Trigger maven full run [2]
- Wait for listing to finish
- Listing status after new maven lister round of listing
- Ping in mailing list discussion with data!
Old maven behavior results in origins like git://github.com, ... [1]
The new maven lister behavior should now result in canonical github urls http://github.com/user/repo.
Analysis ongoing and report will go after that comment.