Page MenuHomeSoftware Heritage

Create a Lister for launchpad.net
Open, NormalPublic

Description

(Note: as fork is to Github, branch is to Launchpad)
Launchpad uses two types of version control system (git and bazaar) but CVS and Subversion can be imported. For git repositories, they can be directly fed to git loader, but for bazaar repositories, we need a separate loader.

Bazaar repo can be downloaded via a bazaar command of this format

bzr branch lp:<projectname>

(reference http://blog.launchpad.net/general/the-great-source-code-supermarket)

and for git it is of the format

git clone https://git.launchpad.net/<projectname>

In launchpad for every project there one main branch called trunk That is in format

bzr branch lp:<projectname>

and rest are its branches which are in the format bzr branch

lp:~<author.name>/<project.name>/<name>

To ingest all the code, we need to list all the branches of all the projects.
Launchpad proves an API which can be used to list all the projects and branches.

What should be the output of lister?
The output of lister for git projects should be in this format https://git.launchpad.net/<projectname>
And for the bazaar repos, the output should be lp:x where x could be ~<author.name>/<project.name>/<name> or <projectname> depending on whether it project of a branch

Plan to execute the lister-
Either we can use the API to list all the projects of we can use the python library launchpadlib that lets you treat the HTTP resources published by Launchpad's web service as Python objects responding to a standard set of commands. Both can do the work well

Now to list all the branches of a project we need to use launchpadlib to get all the branches.
As done in the first answer here https://askubuntu.com/questions/262485/is-there-a-bzr-command-to-see-all-branches-of-a-project-on-launchpad.
Or we could use bare API as

https://api.launchpad.net/1.0/<project_name>?ws.op=getBranches

Event Timeline

nahimilega created this object in space S1 Public.
nahimilega triaged this task as Normal priority.
nahimilega updated the task description. (Show Details)Wed, May 22, 10:26 PM
This comment was removed by nahimilega.
nahimilega updated the task description. (Show Details)Fri, May 24, 11:48 AM
nahimilega updated the task description. (Show Details)
nahimilega updated the task description. (Show Details)Fri, May 24, 12:09 PM
nahimilega updated the task description. (Show Details)Tue, May 28, 8:05 PM
nahimilega added a comment.EditedTue, May 28, 8:35 PM

Launchpadlib
Pros
The library is available on the Debian stretch.
Easier and faster to get all the branches of a project as it returns at one go whereas bare API returns in an indexing fashion.

Cons
Error handling would be cumbersome.
incremental_lister would be a bit difficult to make.

Bare Launchpad API
Pros
As IndexingHttpLister base class would be perfect for this work, hence most of the code is already present. So it would be easier to implement.
Error handling would be already present in the base class hence no need to worry.
Test cases can also be easily made.
It would be quite similar to other listers like GitHub, hence maintain the uniformity in the code.
Does not require any auth credentials

Cons -
It returns branches of a project in an indexing fashion

As far as speed is concerned, I tried both of them, although I didn't time their response time, both took almost the same time, maybe Bare Launchpad API faster because we can get five repos at a time whereas only one with lib.

nahimilega updated the task description. (Show Details)Tue, May 28, 8:36 PM

In my view, we can use the best of both the options to make the lister.
We can use bare API to list down the projects and then use launchpadlib to get all the branches for a project.
In this way, we could use the indexing quality of bare API and simplicity of launchpadlib.

anonbnr added a subscriber: anonbnr.Sun, Jun 2, 8:29 PM

Hello, we are a group of M1 computer science students of the University of Montpellier, France.

We designed a proper Launchpad Lister, but only for Git-based projects, since the majority use Bazaar, and a Bazaar loader isn't yet implemented for SWH. We're currently at the latest stage of development (testing).
We followed the SWH documentation concerning the implementation of unit tests, and attempted to configure the testing environment properly by using mkvirtualenv and installing tox and pytest to automate the testing process.
However, while loading the testing version of SWH packages, we keep getting the same error :

ERROR: swh-archiver[testing] should either be a path to a local project or a VCS url beginning with svn+, git+, hg+, or bzr+.

We looked at the swh-environment git log, and we saw that swh-archiver has been removed from the environment as explicited by the following commit message:

".mrconfig: Remove swh-archiver from swh-environment".

So now we're incapable of actually executing our unit tests.

On the other hand, I think we might have errors related to the configuration of postgresql to perform the database insertion of nodes, as we end up having permission related errors that we're incapable of solving...

Finally, other errors related to our design might exist, but we didn't reach this stage yet.

While browsing for a solution to our problem, we stumbled upon this thread. We were very happy to notice a similar approach to the problem. So we contacted our project supervisor who advised us to get in contact with SWH and see if we can collaborate on the issue. We'd be happy to collaborate with you.

Would you like to take a look at our code?

olasd added a comment.Mon, Jun 3, 6:34 PM

Hello, we are a group of M1 computer science students of the University of Montpellier, France.

Hi and welcome to Software Heritage!

We designed a proper Launchpad Lister, but only for Git-based projects, since the majority use Bazaar, and a Bazaar loader isn't yet implemented for SWH. We're currently at the latest stage of development (testing).

Awesome!

We followed the SWH documentation concerning the implementation of unit tests, and attempted to configure the testing environment properly by using mkvirtualenv and installing tox and pytest to automate the testing process.
However, while loading the testing version of SWH packages, we keep getting the same error :

ERROR: swh-archiver[testing] should either be a path to a local project or a VCS url beginning with svn+, git+, hg+, or bzr+.

We looked at the swh-environment git log, and we saw that swh-archiver has been removed from the environment as explicited by the following commit message:

".mrconfig: Remove swh-archiver from swh-environment".

So now we're incapable of actually executing our unit tests.

Looks like you'll need to remove the swh-archiver directory from swh-environment, so that it doesn't get installed any more. This will probably fix that issue.

On the other hand, I think we might have errors related to the configuration of postgresql to perform the database insertion of nodes, as we end up having permission related errors that we're incapable of solving...
Finally, other errors related to our design might exist, but we didn't reach this stage yet.
While browsing for a solution to our problem, we stumbled upon this thread. We were very happy to notice a similar approach to the problem. So we contacted our project supervisor who advised us to get in contact with SWH and see if we can collaborate on the issue. We'd be happy to collaborate with you.
Would you like to take a look at our code?

@nahimilega is one of our Google Summer of Code interns, and one of the things he had planned to work on was the Launchpad lister; it's perfectly fine that you've started work on this, we're of course happy to take all (constructive!) contributions.

When contributing to a software project it's usually a good idea to work out the design with the original authors before jumping right into coding. This gives you a better chance of getting your code accepted, and avoids potentially painful review round-trips due to design disagreements.

I suggest that you now submit the code you have written as a Phabricator diff (https://wiki.softwareheritage.org/wiki/Code_review_in_Phabricator), and to follow up to this task with the design that you've chosen to implement the launchpad lister. We can discuss whether the approach looks good or not, and then work on testing it.

For "developer support" questions like the swh-archiver issue or the PostgreSQL stuff, you can also join our IRC channel to get more interactive help (works better during European office hours). You'll want to submit full log traces of your issues (containing what command you've run and the full output), using a Paste.