Page MenuHomeSoftware Heritage

[WIP] add first implementation of FusionForge lister
AbandonedPublic

Authored by anlambert on Nov 10 2017, 5:13 PM.

Details

Reviewers
None
Group Reviewers
Reviewers
Summary

first version of a FusionForge lister (T778)

This is the first attempt at writing a lister for projects hosted on a FusionForge instance, like for instance:

  • gforge.inria.fr (T390)
  • adullact.net (T775)
  • sourcesup.renater.fr

Currently, the lister only considers git and svn repositories and works quite well with
the three forge cited above. It should also work with other FusionForge instances if
the url schemes for their hosted repositories can be handled by the current implementation.

Diff Detail

Repository
rDLS Listers
Branch
fusionforge-lister
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 1109
Build 1453: arc lint + arc unit

Event Timeline

Hey,

Rather than having the fusionforge base url in the configuration, I think it should be set as an argument to the lister. However I think the credentials store should indeed be in a configuration file.

That way, we can deploy the fusionforge lister once and create tasks in the scheduler to list the different (public) fusionforges that we know of "dynamically" without having to redeploy a config, leaving the configuration for the credentials store.

This also makes sure that failing to list one of the forges won't impact listing the others; it will also let us have different reccurrence times for different forges.

The rest of the approach looks perfectly reasonable but I haven't looked in depth yet.

Two other testcases for you: alioth.debian.org; www.fusionforge.org.

Effectively, passing the forge base url as an argument to the lister seems a better choice. I will update the diff accordingly next week.

Otherwise, I tried the lister several times with the alioth forge but after a few calls to the soap api, I got banned and could not issue any more requests (even basic http get ones)
for a couple of hours. Looks like there is some requests rate limiting in place. I also had the same kind of issue when trying to load (not list) repositories from the renater forge (however
I was a little brutal on the number of concurrent repos being loaded).

Updating D267: add first implementation of FusionForge lister

Some improvements to the FusionForge lister:

  • pass forge url as argument to the lister instead of storing it in configuration
  • properly decode to utf-8 strings describing projects as SOAP replies from the FusionForge API are usually iso-8859-1 encoded (for instance on adullact.net, descriptions are written in french which contains a lot of accented letters which were not correctly decoded)

For the moment, the lister only creates oneshot tasks. Next steps would be
to list only projects with changes since the last swh visit.

This needs some rework before review.

zack retitled this revision from add first implementation of FusionForge lister to [WIP] add first implementation of FusionForge lister.Oct 15 2018, 10:43 AM
zack removed 1 blocking reviewer(s): Reviewers.

Time flies so as swh APIs and code hosting solutions ... I honestly do not have any energy to waste on that, so closing this once for all !

What would left to do to make this lister work? It seems already in good state, and it would be useful to index gforge.inria.fr since it will be closed soon (https://gforge.inria.fr/forum/forum.php?forum_id=11543). For the gforge.inria.fr case specifically, it is worth noticing that project creation is closed already, so a one-shot listing could be an option if it is lighter to set up: I wrote a small script to do that, but after a few requests to https://archive.softwareheritage.org/save/, requests are throttled. I would be happy to send you a listing of the public projects hosted on gforge.inria.fr if it could help.

What would left to do to make this lister work? It seems already in good state, and it would be useful to index gforge.inria.fr since it will be closed soon (https://gforge.inria.fr/forum/forum.php?forum_id=11543). For the gforge.inria.fr case specifically, it is worth noticing that project creation is closed already, so a one-shot listing could be an option if it is lighter to set up: I wrote a small script to do that, but after a few requests to https://archive.softwareheritage.org/save/, requests are throttled. I would be happy to send you a listing of the public projects hosted on gforge.inria.fr if it could help.

All the public projects from the Inria gForge have now been properly archived, using the script available at https://forge.softwareheritage.org/source/listandsavegforge/
This script would benefit from some generalization/parameterization/engineering work, in order to be used on other endangered FusionForge instances.