- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jan 8 2023
Nov 3 2022
Oct 19 2022
Sep 8 2022
\o/ great
Checks:
- task has been scheduled by the scheduler runner process [1]
- listing is being consumed by one worker [2]
- 'maven' listed origins is steadily growing [3]
- New 'maven' listed origins are getting scheduled for ingestion [4]
- maven loaders are ingesting those [5]
Schedule maven-central listing:
swhscheduler@saatchi:~$ curl -s https://repo1.maven.org/maven2/ | head -2 <!DOCTYPE html> <html> swhscheduler@saatchi:~$ curl -s https://maven-exporter.internal.softwareheritage.org/export-maven-central.fld | head -2 doc 0 field 0 swhscheduler@saatchi:~$ curl -s http://saatchi.internal.softwareheritage.org:5008/ <html> <head><title>Software Heritage scheduler RPC server</title></head> <body> <p>You have reached the <a href="https://www.softwareheritage.org/">Software Heritage</a> scheduler RPC server.<br /> See its <a href="https://docs.softwareheritage.org/devel/swh-scheduler/">documentation and API</a> for more information</p> </body> </html>swhscheduler@saatchi:~$ swh scheduler --url http://saatchi.internal.softwareheritage.org:5008/ \ > task add list-maven-full \ > url=https://repo1.maven.org/maven2/ \ > index_url=https://maven-exporter.internal.softwareheritage.org/export-maven-central.fld Created 1 tasks
Finally, export is done on maven central [1], the fld is computed [2]...
And it's also exposed, hence reachable from lister worker nodes.
Sep 7 2022
Sep 6 2022
Jun 16 2022
May 13 2022
Apr 29 2022
Apr 14 2022
Feb 13 2022
Jan 24 2022
Jan 21 2022
Jan 17 2022
Jan 15 2022
@ardumont I've added a nginx container to the main docker-compose file and made it serve one of the example fld files (in the conf/maven-index directory).
The served file can be accessed from the lister container, but for now the task doesn't pick anything -- I don't see it in the lister container logs at all, and (thus) the psql commands returns 0 rows. I'll investigate why (I made it work a month ago, so..), but a quick discussion about the scheduler might help on IRC. I'll be connected on IRC this monday, if we can take a chance to discuss the issue (and check that the compose thing is ok) that would be helpful.
Jan 10 2022
Thanks! You did well, I had not been notified about your post and didn't know about it. Sorry for overlooking that. I'll have a look this week.
Happy new year btw, talk to you soon!
Jan 7 2022
@borisbaldassari Hello, gentle ping about ^
Dec 17 2021
On second thoughts: in order to run the docker-dev setup, I also had to run a virtual
machine alongside the swh setup to host the text index file, and make sure the swh vm
could access it. I suppose that any vm/docker/baremetal machine with an apache/nginx
server could do for that, as long as the lister can http-fetch the .fld file.
Dec 8 2021
On second thoughts: in order to run the docker-dev setup, I also had to run a virtual machine alongside the swh setup to host the text index file, and make sure the swh vm could access it.
I suppose that any vm/docker/baremetal machine with an apache/nginx server could do for that, as long as the lister can http-fetch the .fld file.
In T1724#75125, @ardumont wrote:I'm asking you for a diff with the exact changes you had to make in the
swh-environment/docker/docker-compose.yml (and other folders) to actually make it run.
That will definitely help for the deployment on staging.
I'm not sure what you mean by the docker diff.
Dec 6 2021
I'm not sure what you mean by the docker diff. Is that the update of the maven-index-exporter repository at D6740?
The above-mentioned repository has documentation to build, test and run the text index generation. As mentioned there I've also created a bunch of compressed text index exports, that can be used to test the lister/loader without running the docker image immediately. They are all real-world extracts obtained by running the docker image on the list of Maven repositories I could get as of last week. They together represent a few million artefacts.
Dec 3 2021
Dec 1 2021
Nov 23 2021
Hi there,
Nov 22 2021
Oct 2 2021
Aug 30 2021
Aug 29 2021
Aug 27 2021
Jun 21 2021
Updates:
- A ticket has been submitted in the Sonatype JIRA to let them know we will fetch maven poms and src jars soon.
- An email has been sent on the maven-dev mailing list with a few kind answers, mainly stating to let Sonatype know through a JIRA issue.
- Hervé Bouthemy provided some precious insights about the best way to use the poms; it seems we can get a near-complete list of maven repositories worldwide by parsing some pom arguments and following dependencies up. It should probably not be used directly by the lister (which should provide only the list of src jars and scm attributes to the loaders), but we can output it somewhere to feed the lister manually.
Jun 9 2021
Update for the Maven Indexer prototype: it works! (finally)
Jun 8 2021
Some more information about the maven indexer. Beware people it's a bit dirty, and you're not going to like it infra-wise.
So, to sum up the options we have.. Basically we "just" need all artifacts coordinates. From there for each artifact we can:
Jun 5 2021
Few more cents in the bucket..
- scrapping is explicitly forbidden, see https://repo1.maven.org/terms.html -- however making contact first will help us go through most of the abuse-limiting rules I guess.
- regarding fasten, there are indeed some bits that could be useful. However most of our difficulties are in getting a list of projects, whereas this information is already provided by the user in the case of fasten. So, interesting and useful, but not a game changer regarding the difficult part of our job.
Mar 17 2021
After recent exchanges with @hboutemy and Charles Sabourdin, here is a clarification of the scope of this task.
We need a Maven repository lister that addresses the following issues:
Mar 11 2021
@hboutemy : I wonder if you are aware that we have now in place a grant program that allows to fund development of listers like this one.
All the information is available at https://www.softwareheritage.org/grants and you can mail me for more info if needed.
Nov 17 2020
Implemented during GSOC 2019, closing this.
Let's close that task when we reach 80% of code coverage
Sep 12 2019
Here is an interesting update on the issue of listing Maven Central. Great people at the FASTEN EU project are analyzing software dependencies and for that they are working on a tool to download projects from various sources, including Maven.
The tool is here: https://github.com/fasten-project/source-populate
It appears to be more about downloading a known project source than listing the content of a repository, but we could try and share efforts in this space.
Aug 7 2019
Aug 6 2019
Jul 11 2019
Resolved in a4816b7bb313
Jul 10 2019
Jul 9 2019
Jun 12 2019
I plan to add the following e2e tests-
- Test basic webapp functionalities like 'sidebar', 'back-to-top'
- Test home page displays positive stats for directories, authors,...
- Test the origin-search with different combinations of checkboxes
- Test basic functionality of directory view.
- Test file being displayed (for some known format, maybe .txt)
- Test error being displayed when invalid sha1 or unknown origin url
Jun 11 2019
Build is green
See https://jenkins.softwareheritage.org/job/DWAPPS/job/tox/486/ for more details.