Page MenuHomeSoftware Heritage

Implement a NuGet(.NET) lister
Closed, MigratedEdits Locked


Here is the implementations plan for Nuget(.NET) lister
There are several pages in which data of NuGet packages are stored, and to get value the total number of pages we first have to visit this URL ( ) It tells the no of pages that are there.
Next, we have to get to each and every page (page URL present in the response of the previous API) to get the names of packages and their versions and the link to their metadata.
Then we have to visit the metadata URL for each package to get the link for the project URL and the package description.

Event Timeline

nahimilega triaged this task as Normal priority.May 15 2019, 1:31 PM
nahimilega created this task.
nahimilega created this object in space S1 Public.

As discussed on IRC the source code link for the repository is in very few of the repositories and the version control system used by repositories is not mentioned in the API response.
One way is Repository URL and the repository type field are present in .nuspec file for each project, so we have to download that file for each project and get source URL but the problem with this is downloading all binary packages to get a small chance to find a link to a source repository sounds like a lot of work, bandwidth and computing power for not much gain and that would only cover one of the ways package maintainers can set the source code information; the aforementioned blog post listed at least four

@olasd recommended trying the listing approach for NuGET lister we discussed(to fetch for repository key in the api response), As recommended, I tried the approach on small dataset. I tried it on 1412 repositories are all of them were quite latest. I found 0 repository URL in them and in 900 of them repository key was empty(ie they were blank string). I think we need to change our approach.

vlorentz removed a project: GSoC 2019.