Page MenuHomeSoftware Heritage
Paste P1070

Submitted ticket to sonatype Jira for maven central crawling
ActivePublic

Authored by borisbaldassari on Jun 15 2021, 2:16 PM.
https://issues.sonatype.org/browse/MVNCENTRAL-6804 (registration required)
I'm currently developing a Maven Central connector for the Software Heritage Foundation [1]. In a nutshell, the SWH aims to archive all existing source code in the world, and besides archiving to provide useful tools (unique IDs, search, graph-related tools..). It's all open-source, and many large forges and software systems have already been archived (GitHub, GitLab, npm, pypi, debian packages, CRAN..).
[1] https://www.softwareheritage.org/
I'm reaching out to start a discussion about how this could be achieved without impacting anybody, i.e. neither you (as maven central maintainers) nor the users. Our plan for now is to use the maven indexer indexes for the listing, and then download poms and source jars. We of course respect all rate-limiting policies (and http error codes), and we are polite and patient (although tenacious).
No action is expected from you, we simply want to let you know what we are doing and are listening to your wise advice (if any).

Event Timeline