Page MenuHomeSoftware Heritage
Feed Advanced Search

May 28 2019

nahimilega added a comment to T1734: Create a Lister for

The library is available on the Debian stretch.
Easier and faster to get all the branches of a project as it returns at one go whereas bare API returns in an indexing fashion.

May 28 2019, 8:35 PM · Lister, Archive coverage
nahimilega updated the task description for T1734: Create a Lister for
May 28 2019, 8:05 PM · Lister, Archive coverage
nahimilega retitled D1497: Maven Lister from Maven Central Lister to Maven Lister.
May 28 2019, 7:50 PM
nahimilega added inline comments to D1482: GNU Lister.
May 28 2019, 6:28 PM
nahimilega updated the diff for D1482: GNU Lister.
  • Add test cases for find_tarball and remove_unnecessary_directories function
May 28 2019, 6:20 PM
nahimilega edited P411 Error.
May 28 2019, 5:47 PM
nahimilega created P411 Error in the S1 Public space.
May 28 2019, 4:47 PM
nahimilega created P410 Exception with binary in the S1 Public space.
May 28 2019, 4:43 PM
nahimilega updated the summary of D1492: CRAN Lister.
May 28 2019, 1:37 PM
nahimilega updated the diff for D1492: CRAN Lister.
  • Change rcran name to cran
  • Convert oneliner R command to a R script.
May 28 2019, 1:33 PM

May 27 2019

nahimilega added a comment to D1482: GNU Lister.

To check the working of the code, I made a separate script using the same code like that in this lister with the functions which are particular to GNU lister like find_all_tarball and list_packages etc . I ran the script to ensure the working the algorithm I am using. It worked fine.
Here is the complete output -

May 27 2019, 11:57 PM
nahimilega created P409 A bit pretty printed structure of the output of the gnu lister in the S1 Public space.
May 27 2019, 11:42 PM
nahimilega updated the diff for D1482: GNU Lister.
  • Change download method of tree.json file
  • Add functionality to list all the tarball for a package.
May 27 2019, 11:36 PM
nahimilega updated the diff for D1482: GNU Lister.
  • Change download method of tree.json file
  • Add functionality to list all the tarball for a package.
May 27 2019, 11:25 PM
nahimilega updated the diff for D1482: GNU Lister.
  • Change download method of tree.json file
  • Add functionality to list all the tarball for a package.
May 27 2019, 11:20 PM
nahimilega created P408 Full output of gnu lister in the S1 Public space.
May 27 2019, 11:12 PM
nahimilega updated the diff for D1482: GNU Lister.
  • Change download method of tree.json file
  • Add functionality to list all the tarball for a package.
May 27 2019, 10:58 PM
nahimilega updated subscribers of T1718: Implement a NuGet(.NET) lister.
May 27 2019, 2:16 PM · Archive coverage

May 26 2019

nahimilega added a comment to D1441: tutorial: How to run a new lister (within docker-dev).

@vlorentz I only added vital information which is very particular to the project and is not present anywhere else.
I have not added more details like api_good_response file or the whole process of writing tests because it may make the tutorial extremely lengthy a new contributor can figure it out by reading the code, whereas adding lister in the main is not mentioned anywhere else and cannot be quickly figured out by new contributor.

May 26 2019, 8:54 PM
nahimilega updated the diff for D1441: tutorial: How to run a new lister (within docker-dev).

Add testing section in tutorial docs

May 26 2019, 8:40 PM
nahimilega updated the diff for D1441: tutorial: How to run a new lister (within docker-dev).

Added testing section in tutorial doc

May 26 2019, 8:36 PM
nahimilega added a comment to D1492: CRAN Lister.

Let's just agree to disgree here. Some of us know how to do this with packages, and have done for years if not decades.

May 26 2019, 8:03 PM
nahimilega added inline comments to D1492: CRAN Lister.
May 26 2019, 7:55 PM
nahimilega added a comment to D1492: CRAN Lister.

@eddelbuettel As you recommended, I did some preprocessing only to get the useful data.
Also, I changed from a separate file to a single line R script because it is difficult to determine the location of the R script to execute it when the module will be shipped as a package or run in production.

May 26 2019, 7:50 PM
nahimilega updated the summary of D1492: CRAN Lister.
May 26 2019, 7:46 PM
nahimilega created P407 Response from R Script in the S1 Public space.
May 26 2019, 7:44 PM
nahimilega updated the diff for D1492: CRAN Lister.
  • Optimize R Script to make it scalable
May 26 2019, 7:40 PM
nahimilega updated the diff for D1492: CRAN Lister.
  • Replace R script with a single line command.
May 26 2019, 7:01 PM

May 24 2019

nahimilega added a comment to T1724: Maven Central repository support.

Extending on what I wrote in the previous comment, I did a bit more research about this.

May 24 2019, 4:13 PM · Maven loader, Maven lister, GSoC 2019, Archive coverage
nahimilega updated the task description for T1734: Create a Lister for
May 24 2019, 12:09 PM · Lister, Archive coverage
nahimilega updated the task description for T1734: Create a Lister for
May 24 2019, 11:49 AM · Lister, Archive coverage
nahimilega updated the task description for T1734: Create a Lister for
May 24 2019, 11:48 AM · Lister, Archive coverage
nahimilega added a comment to D1492: CRAN Lister.

Thanks @ardumont @douardda, I will remember your advice.

May 24 2019, 10:18 AM
nahimilega updated the diff for D1492: CRAN Lister.
  • Updated doc string
May 24 2019, 6:37 AM

May 23 2019

nahimilega added a comment to T1724: Maven Central repository support.

As recommended by @olasd I checkout out Maven Central index ( this is a

May 23 2019, 3:00 PM · Maven loader, Maven lister, GSoC 2019, Archive coverage
nahimilega created P405 Data in the index file in Maven central in the S1 Public space.
May 23 2019, 1:02 PM

May 22 2019

nahimilega updated the task description for T1724: Maven Central repository support.
May 22 2019, 11:02 PM · Maven loader, Maven lister, GSoC 2019, Archive coverage
nahimilega updated subscribers of T1724: Maven Central repository support.

Comment by @olasd

May 22 2019, 11:02 PM · Maven loader, Maven lister, GSoC 2019, Archive coverage
nahimilega renamed T1724: Maven Central repository support from Maven Central (JAVA) lister to Maven Lister.
May 22 2019, 11:01 PM · Maven loader, Maven lister, GSoC 2019, Archive coverage
nahimilega added a comment to D1497: Maven Lister.
  • the repository format is "Maven"; "Maven Central" is only one instance of a Maven repository. There's many more public Maven repositories that would be useful to index, for instance Clojars or the Google Android maven repo : You'll need to rename the lister to "maven", and to modify the code to avoid hard-coding the maven repository root, making it an argument to the task instead (as we will want to list projects for several instances).
  • you went for a scraping approach, which is fine as a last resort. However, a quick search for "maven central index" brought up Looks like these indexes are available to allow importing the full

It looks like these indexes are available at least for the following maven repositories :

The index also provides an incremental version (referenced in a properties file) which would allow for incremental updates without having to re-download the full index.

The Google repo also has an index but it looks very different from the other maven repos I've found. However, it's fairly small compared to the others, so it shouldn't be too hard to sort it out as well.

Please investigate the format of these repository indexes, and the data they provide, and see whether that would be suitable for use as the data source for the lister.

Thanks for a heads up, I didn't knew about this. I will go through the repository indexes and their provided data and inform you about it by latest.

May 22 2019, 10:59 PM
nahimilega updated subscribers of T1734: Create a Lister for
May 22 2019, 10:37 PM · Lister, Archive coverage
nahimilega updated the task description for T1734: Create a Lister for
May 22 2019, 10:26 PM · Lister, Archive coverage
nahimilega triaged T1734: Create a Lister for as Normal priority.
May 22 2019, 7:57 PM · Lister, Archive coverage
nahimilega added inline comments to D1482: GNU Lister.
May 22 2019, 1:21 PM
nahimilega updated the diff for D1482: GNU Lister.
  • Change variable names according to convention.
  • Add GNU lister in README and
  • Add functions necessary in abstract attribute
May 22 2019, 1:21 PM
nahimilega added a comment to D1492: CRAN Lister.

@douardda Thanks for helping me out to improve commit messages.
Although I was wondering before landing the diff we usually squash all the commits to one single one, so what is the need to follow strict guidelines for commit messages in the process of improving the diff. I mean at the end they all are going to be squashed to one single commit. :)

May 22 2019, 12:44 PM
nahimilega updated the diff for D1492: CRAN Lister.
  • Improve commit messages by using imperative form
May 22 2019, 12:32 PM

May 21 2019

nahimilega updated the diff for D1492: CRAN Lister.
  • Added functions necessary in to be present because of @abc.abstractmethod
May 21 2019, 7:42 PM
nahimilega updated the diff for D1497: Maven Lister.
  • Added Maven Central lister in and
May 21 2019, 6:07 PM
nahimilega added inline comments to D1492: CRAN Lister.
May 21 2019, 5:58 PM
nahimilega updated the diff for D1492: CRAN Lister.
  • Added rcran lister in readme and
May 21 2019, 5:56 PM
nahimilega added a task to D1497: Maven Lister: T1724: Maven Central repository support.
May 21 2019, 3:31 PM
nahimilega added a revision to T1724: Maven Central repository support: D1497: Maven Lister.
May 21 2019, 3:31 PM · Maven loader, Maven lister, GSoC 2019, Archive coverage
nahimilega updated subscribers of D1492: CRAN Lister.
May 21 2019, 3:29 PM
nahimilega updated the summary of D1492: CRAN Lister.
May 21 2019, 1:57 PM
nahimilega added a revision to T1709: implement an R-cran lister: D1492: CRAN Lister.
May 21 2019, 1:55 PM · GSoC 2019, Archive coverage
nahimilega added a task to D1492: CRAN Lister: T1709: implement an R-cran lister.
May 21 2019, 1:55 PM
nahimilega updated the diff for D1492: CRAN Lister.
  • Changed print to stdout in R script and impoved commit messages
May 21 2019, 1:51 PM
nahimilega updated the diff for D1497: Maven Lister.

Removed useless commit messages

May 21 2019, 1:29 PM
Herald added a reviewer for D1497: Maven Lister: Reviewers.
May 21 2019, 1:21 PM

May 20 2019

nahimilega updated the summary of D1492: CRAN Lister.
May 20 2019, 5:46 PM
Herald added a reviewer for D1492: CRAN Lister: Reviewers.
May 20 2019, 3:59 PM
nahimilega added a comment to T1725: Software Heritage name not displayed completely in web app.

Browser - Chromium
OS - Ubuntu 18.04
Zoom level in chromium - 100%
Screen Resolution - 1920 * 1080

May 20 2019, 11:57 AM · Web app

May 19 2019

nahimilega triaged T1725: Software Heritage name not displayed completely in web app as Normal priority.
May 19 2019, 6:45 PM · Web app

May 17 2019

nahimilega triaged T1724: Maven Central repository support as Normal priority.
May 17 2019, 11:26 PM · Maven loader, Maven lister, GSoC 2019, Archive coverage
nahimilega added a comment to D1482: GNU Lister.

And these will be passed to loaders

Not loaders, only 1 loader, the gnu one.

May 17 2019, 3:44 PM
nahimilega added a comment to D1482: GNU Lister.

I sense some communication gap, I need to state more clearly what I am thinking of doing so that you can help me more effectively. Please correct me if am I wrong somewhere or there is a better method.

May 17 2019, 2:59 PM
nahimilega added inline comments to D1482: GNU Lister.
May 17 2019, 1:59 PM
nahimilega added inline comments to D1482: GNU Lister.
May 17 2019, 1:52 PM
nahimilega updated the diff for D1482: GNU Lister.
  • Changed header
May 17 2019, 1:48 PM
nahimilega added inline comments to D1482: GNU Lister.
May 17 2019, 1:47 PM
nahimilega added a comment to D1482: GNU Lister.

I think It would be great if loader-tar can be shifted to the core.

May 17 2019, 1:38 PM
nahimilega updated the diff for D1482: GNU Lister.

Update to chek if test pass

May 17 2019, 1:32 PM
nahimilega updated the diff for D1482: GNU Lister.


May 17 2019, 1:26 PM
nahimilega updated subscribers of D1482: GNU Lister.

Related T1351

May 17 2019, 12:32 PM
Herald added a reviewer for D1482: GNU Lister: Reviewers.
May 17 2019, 12:30 PM

May 16 2019

nahimilega updated subscribers of T1351: (periodically) ingest GNU package releases.

As suggested by @olasd, what was done in 2015 to ingest packages -

  1. Create origins for all the folders indiscriminately
  2. Only import things that look like tarballs (i.e. that end with .tar.something)
May 16 2019, 6:12 PM · Archive coverage
nahimilega added a comment to T1718: Implement a NuGet(.NET) lister.

@olasd recommended trying the listing approach for NuGET lister we discussed(to fetch for repository key in the api response), As recommended, I tried the approach on small dataset. I tried it on 1412 repositories are all of them were quite latest. I found 0 repository URL in them and in 900 of them repository key was empty(ie they were blank string). I think we need to change our approach.

May 16 2019, 4:32 PM · Archive coverage
nahimilega updated subscribers of T1718: Implement a NuGet(.NET) lister.
May 16 2019, 12:39 PM · Archive coverage
nahimilega added a comment to T1718: Implement a NuGet(.NET) lister.

As discussed on IRC the source code link for the repository is in very few of the repositories and the version control system used by repositories is not mentioned in the API response.
One way is Repository URL and the repository type field are present in .nuspec file for each project, so we have to download that file for each project and get source URL but the problem with this is downloading all binary packages to get a small chance to find a link to a source repository sounds like a lot of work, bandwidth and computing power for not much gain and that would only cover one of the ways package maintainers can set the source code information; the aforementioned blog post listed at least four

May 16 2019, 12:38 PM · Archive coverage
nahimilega added a comment to D1365: Implemented a lister for phabricator instance.

Thanks, @anlambert, for your help and guidance. As it was my first lister, I would have never been able to complete it without your help. You review assisted me in making this lister more robust and also helping me understand the basics of Lister.
Once again, thanks for your patience and guidance.

May 16 2019, 7:23 AM

May 15 2019

nahimilega committed rDLSfedfd73c8e4b: swh.lister.phabricator (authored by nahimilega).
May 15 2019, 4:42 PM
nahimilega closed T808: phabricator lister, a subtask of T807: dogfooding: ingest the Software Heritage forge into the archive (via the canonical URLs), as Resolved.
May 15 2019, 4:42 PM · General
nahimilega closed D1365: Implemented a lister for phabricator instance.
May 15 2019, 4:42 PM
nahimilega closed T808: phabricator lister as Resolved by committing rDLSfedfd73c8e4b: swh.lister.phabricator.
May 15 2019, 4:42 PM · Easy hack, Phabricator forge
nahimilega added a comment to D1365: Implemented a lister for phabricator instance.

@anlambert As you mentioned in your previous comment, to remove None from the list I have added the function filter_before_inject() in the lister as you recommended to do.
And I have rebased the branch on origin/master

May 15 2019, 4:30 PM
nahimilega updated the diff for D1365: Implemented a lister for phabricator instance.

made all the changes recommended

May 15 2019, 4:27 PM
nahimilega updated the diff for D1365: Implemented a lister for phabricator instance.

Removed None from the final list

May 15 2019, 2:48 PM
nahimilega added a comment to T1718: Implement a NuGet(.NET) lister.

API Documentation -

May 15 2019, 1:34 PM · Archive coverage
nahimilega triaged T1718: Implement a NuGet(.NET) lister as Normal priority.
May 15 2019, 1:31 PM · Archive coverage
nahimilega added a comment to T1709: implement an R-cran lister.

@olasd I do not have any familiarity with R language. Learning some basics and making this script would take me around a week. I was wondering it is possible that someone in Software Heritage who have some experience with R can write this script as it would be a matter of minutes to the person who knows R.
Is it possible to do so?

May 15 2019, 11:33 AM · GSoC 2019, Archive coverage

May 14 2019

nahimilega updated the diff for D1365: Implemented a lister for phabricator instance.

squash commits

May 14 2019, 8:49 PM
nahimilega updated the diff for D1365: Implemented a lister for phabricator instance.

squash commits

May 14 2019, 8:40 PM
nahimilega updated the diff for D1365: Implemented a lister for phabricator instance.
  • Updated README according to new standard
May 14 2019, 2:16 PM

May 13 2019

nahimilega added a comment to T1709: implement an R-cran lister.

Here is an implementation plan for making R-CRAN lister.
I have taken inspiration from the pypi lister.
To make for R-CRAN, we need to inherit SimpleLister class and override ingest_data() function and change its first line (where safely_issue_request() is called) to call the function which would run R script to return a json response.
Then after that it is quite like any normal response, we just need to implement following function list_packages, compute url, get_model_from_repo, task_dict and transport_response_simplified.

May 13 2019, 9:36 PM · GSoC 2019, Archive coverage
nahimilega updated the diff for D1365: Implemented a lister for phabricator instance.
  • Updated testcase in phabricator lister
May 13 2019, 6:18 PM
nahimilega added inline comments to D1461: Fix outdated README for listers and improve formatting.
May 13 2019, 3:30 PM
nahimilega updated subscribers of T1709: implement an R-cran lister.

@faux on IRC mentioned that there is a public DB dump ( which might be helpful for the purpose.
This DB dump contains files with .rds extension which is used by R language. Here are a couple of rows from that DB dump

May 13 2019, 1:56 PM · GSoC 2019, Archive coverage
nahimilega triaged T1709: implement an R-cran lister as Normal priority.
May 13 2019, 1:48 PM · GSoC 2019, Archive coverage
nahimilega updated the diff for D1365: Implemented a lister for phabricator instance.
  • Fixed a typo in phabricator lister
May 13 2019, 12:07 PM