Stuff related to Software Heritage participation into Google Summer of Code program, 2019 edition.
Thu, Nov 3
Oct 19 2022
Sep 8 2022
- task has been scheduled by the scheduler runner process 
- listing is being consumed by one worker 
- 'maven' listed origins is steadily growing 
- New 'maven' listed origins are getting scheduled for ingestion 
- maven loaders are ingesting those 
Schedule maven-central listing:
swhscheduler@saatchi:~$ curl -s https://repo1.maven.org/maven2/ | head -2 <!DOCTYPE html> <html> swhscheduler@saatchi:~$ curl -s https://maven-exporter.internal.softwareheritage.org/export-maven-central.fld | head -2 doc 0 field 0 swhscheduler@saatchi:~$ curl -s http://saatchi.internal.softwareheritage.org:5008/ <html> <head><title>Software Heritage scheduler RPC server</title></head> <body> <p>You have reached the <a href="https://www.softwareheritage.org/">Software Heritage</a> scheduler RPC server.<br /> See its <a href="https://docs.softwareheritage.org/devel/swh-scheduler/">documentation and API</a> for more information</p> </body> </html>swhscheduler@saatchi:~$ swh scheduler --url http://saatchi.internal.softwareheritage.org:5008/ \ > task add list-maven-full \ > url=https://repo1.maven.org/maven2/ \ > index_url=https://maven-exporter.internal.softwareheritage.org/export-maven-central.fld Created 1 tasks
Finally, export is done on maven central , the fld is computed ...
And it's also exposed, hence reachable from lister worker nodes.
Sep 7 2022
Sep 6 2022
Jun 16 2022
May 13 2022
Apr 29 2022
Apr 14 2022
Feb 13 2022
Jan 24 2022
Jan 21 2022
Jan 17 2022
Jan 15 2022
@ardumont I've added a nginx container to the main docker-compose file and made it serve one of the example fld files (in the conf/maven-index directory).
The served file can be accessed from the lister container, but for now the task doesn't pick anything -- I don't see it in the lister container logs at all, and (thus) the psql commands returns 0 rows. I'll investigate why (I made it work a month ago, so..), but a quick discussion about the scheduler might help on IRC. I'll be connected on IRC this monday, if we can take a chance to discuss the issue (and check that the compose thing is ok) that would be helpful.
Jan 10 2022
Thanks! You did well, I had not been notified about your post and didn't know about it. Sorry for overlooking that. I'll have a look this week.
Happy new year btw, talk to you soon!
Jan 7 2022
@borisbaldassari Hello, gentle ping about ^
Dec 17 2021
On second thoughts: in order to run the docker-dev setup, I also had to run a virtual
machine alongside the swh setup to host the text index file, and make sure the swh vm
could access it. I suppose that any vm/docker/baremetal machine with an apache/nginx
server could do for that, as long as the lister can http-fetch the .fld file.
Dec 8 2021
On second thoughts: in order to run the docker-dev setup, I also had to run a virtual machine alongside the swh setup to host the text index file, and make sure the swh vm could access it.
I suppose that any vm/docker/baremetal machine with an apache/nginx server could do for that, as long as the lister can http-fetch the .fld file.
I'm not sure what you mean by the docker diff.
Dec 6 2021
I'm not sure what you mean by the docker diff. Is that the update of the maven-index-exporter repository at D6740?
The above-mentioned repository has documentation to build, test and run the text index generation. As mentioned there I've also created a bunch of compressed text index exports, that can be used to test the lister/loader without running the docker image immediately. They are all real-world extracts obtained by running the docker image on the list of Maven repositories I could get as of last week. They together represent a few million artefacts.