(Note: as fork is to Github, branch is to Launchpad)
Launchpad uses two types of version control system (git and bazaar) but CVS and Subversion can be imported. For git repositories, they can be directly fed to git loader, but for bazaar repositories, we need a separate loader.
Bazaar repo can be downloaded via a bazaar command of this format
bzr branch lp:<projectname>
(reference http://blog.launchpad.net/general/the-great-source-code-supermarket)
and for git it is of the format
git clone https://git.launchpad.net/<projectname>
In launchpad for every project there one main branch called trunk That is in format
bzr branch lp:<projectname>
and rest are its branches which are in the format bzr branch
lp:~<author.name>/<project.name>/<name>
To ingest all the code, we need to list all the branches of all the projects.
Launchpad proves an API which can be used to list all the projects and branches.
What should be the output of lister?
The output of lister for git projects should be in this format https://git.launchpad.net/<projectname>
And for the bazaar repos, the output should be lp:x where x could be ~<author.name>/<project.name>/<name> or <projectname> depending on whether it project of a branch
Plan to execute the lister-
Either we can use the API to list all the projects of we can use the python library launchpadlib that lets you treat the HTTP resources published by Launchpad's web service as Python objects responding to a standard set of commands. Both can do the work well
Now to list all the branches of a project we need to use launchpadlib to get all the branches.
As done in the first answer here https://askubuntu.com/questions/262485/is-there-a-bzr-command-to-see-all-branches-of-a-project-on-launchpad.
Or we could use bare API as
https://api.launchpad.net/1.0/<project_name>?ws.op=getBranches