Page MenuHomeSoftware Heritage

nahimilega (Archit Agrawal)
User

Projects

User Details

User Since
Mar 10 2019, 8:07 PM (10 w, 4 d)

Recent Activity

Today

nahimilega updated the diff for D1492: Implemented R-CRAN Lister.
  • Updated doc string
Fri, May 24, 6:37 AM

Yesterday

nahimilega added a comment to T1724: Maven Lister.

As recommended by @olasd I checkout out Maven Central index ( https://repo.maven.apache.org/maven2/.index/) this is a

Thu, May 23, 3:00 PM · GSoC 2019, Archive coverage
nahimilega created P405 Data in the index file in Maven central in the S1 Public space.
Thu, May 23, 1:02 PM

Wed, May 22

nahimilega updated the task description for T1724: Maven Lister.
Wed, May 22, 11:02 PM · GSoC 2019, Archive coverage
nahimilega updated subscribers of T1724: Maven Lister.

Comment by @olasd

Wed, May 22, 11:02 PM · GSoC 2019, Archive coverage
nahimilega renamed T1724: Maven Lister from Maven Central (JAVA) lister to Maven Lister.
Wed, May 22, 11:01 PM · GSoC 2019, Archive coverage
nahimilega added a comment to D1497: Maven Central Lister.
  • the repository format is "Maven"; "Maven Central" is only one instance of a Maven repository. There's many more public Maven repositories that would be useful to index, for instance Clojars or the Google Android maven repo : https://www.deps.co/guides/public-maven-repositories/. You'll need to rename the lister to "maven", and to modify the code to avoid hard-coding the maven repository root, making it an argument to the task instead (as we will want to list projects for several instances).
  • you went for a scraping approach, which is fine as a last resort. However, a quick search for "maven central index" brought up https://maven.apache.org/repository/central-index.html. Looks like these indexes are available to allow importing the full

It looks like these indexes are available at least for the following maven repositories :

The index also provides an incremental version (referenced in a properties file) which would allow for incremental updates without having to re-download the full index.
The Google repo also has an index https://developer.android.com/studio/build/dependencies.html#google-maven but it looks very different from the other maven repos I've found. However, it's fairly small compared to the others, so it shouldn't be too hard to sort it out as well.
Please investigate the format of these repository indexes, and the data they provide, and see whether that would be suitable for use as the data source for the lister.

Thanks for a heads up, I didn't knew about this. I will go through the repository indexes and their provided data and inform you about it by latest.

Wed, May 22, 10:59 PM
nahimilega updated subscribers of T1734: Create a Lister for launchpad.net.

For the task of listing this API can do the work.

Wed, May 22, 10:37 PM · Archive coverage
nahimilega updated the task description for T1734: Create a Lister for launchpad.net.
Wed, May 22, 10:26 PM · Archive coverage
nahimilega triaged T1734: Create a Lister for launchpad.net as Normal priority.
Wed, May 22, 7:57 PM · Archive coverage
nahimilega added inline comments to D1482: GNU Lister.
Wed, May 22, 1:21 PM
nahimilega updated the diff for D1482: GNU Lister.
  • Change variable names according to convention.
  • Add GNU lister in README and cli.py
  • Add functions necessary in abstract attribute
Wed, May 22, 1:21 PM
nahimilega added a comment to D1492: Implemented R-CRAN Lister.

@douardda Thanks for helping me out to improve commit messages.
Although I was wondering before landing the diff we usually squash all the commits to one single one, so what is the need to follow strict guidelines for commit messages in the process of improving the diff. I mean at the end they all are going to be squashed to one single commit.

Wed, May 22, 12:44 PM
nahimilega updated the diff for D1492: Implemented R-CRAN Lister.
  • Improve commit messages by using imperative form
Wed, May 22, 12:32 PM

Tue, May 21

nahimilega updated the diff for D1492: Implemented R-CRAN Lister.
  • Added functions necessary in to be present because of @abc.abstractmethod
Tue, May 21, 7:42 PM
nahimilega updated the diff for D1497: Maven Central Lister.
  • Added Maven Central lister in README.md and cli.py
Tue, May 21, 6:07 PM
nahimilega added inline comments to D1492: Implemented R-CRAN Lister.
Tue, May 21, 5:58 PM
nahimilega updated the diff for D1492: Implemented R-CRAN Lister.
  • Added rcran lister in readme and cli.py
Tue, May 21, 5:56 PM
nahimilega added a task to D1497: Maven Central Lister: T1724: Maven Lister.
Tue, May 21, 3:31 PM
nahimilega added a revision to T1724: Maven Lister: D1497: Maven Central Lister.
Tue, May 21, 3:31 PM · GSoC 2019, Archive coverage
nahimilega updated subscribers of D1492: Implemented R-CRAN Lister.
Tue, May 21, 3:29 PM
nahimilega updated the summary of D1492: Implemented R-CRAN Lister.
Tue, May 21, 1:57 PM
nahimilega added a revision to T1709: implement an R-cran lister: D1492: Implemented R-CRAN Lister.
Tue, May 21, 1:55 PM · GSoC 2019, Archive coverage
nahimilega added a task to D1492: Implemented R-CRAN Lister: T1709: implement an R-cran lister.
Tue, May 21, 1:55 PM
nahimilega updated the diff for D1492: Implemented R-CRAN Lister.
  • Changed print to stdout in R script and impoved commit messages
Tue, May 21, 1:51 PM
nahimilega updated the diff for D1497: Maven Central Lister.

Removed useless commit messages

Tue, May 21, 1:29 PM
Herald added a reviewer for D1497: Maven Central Lister: Reviewers.
Tue, May 21, 1:21 PM

Mon, May 20

nahimilega updated the summary of D1492: Implemented R-CRAN Lister.
Mon, May 20, 5:46 PM
Herald added a reviewer for D1492: Implemented R-CRAN Lister: Reviewers.
Mon, May 20, 3:59 PM
nahimilega added a comment to T1725: Software Heritage name not displayed completely in web app.

Browser - Chromium
OS - Ubuntu 18.04
Zoom level in chromium - 100%
Screen Resolution - 1920 * 1080

Mon, May 20, 11:57 AM · Web app

Sun, May 19

nahimilega triaged T1725: Software Heritage name not displayed completely in web app as Normal priority.
Sun, May 19, 6:45 PM · Web app

Fri, May 17

nahimilega triaged T1724: Maven Lister as Normal priority.
Fri, May 17, 11:26 PM · GSoC 2019, Archive coverage
nahimilega added a comment to D1482: GNU Lister.

And these will be passed to loaders

Not loaders, only 1 loader, the gnu one.

Fri, May 17, 3:44 PM
nahimilega added a comment to D1482: GNU Lister.

I sense some communication gap, I need to state more clearly what I am thinking of doing so that you can help me more effectively. Please correct me if am I wrong somewhere or there is a better method.

Fri, May 17, 2:59 PM
nahimilega added inline comments to D1482: GNU Lister.
Fri, May 17, 1:59 PM
nahimilega added inline comments to D1482: GNU Lister.
Fri, May 17, 1:52 PM
nahimilega updated the diff for D1482: GNU Lister.
  • Changed header
Fri, May 17, 1:48 PM
nahimilega added inline comments to D1482: GNU Lister.
Fri, May 17, 1:47 PM
nahimilega added a comment to D1482: GNU Lister.

I think It would be great if loader-tar can be shifted to the core.

Fri, May 17, 1:38 PM
nahimilega updated the diff for D1482: GNU Lister.

Update to chek if test pass

Fri, May 17, 1:32 PM
nahimilega updated the diff for D1482: GNU Lister.

Same

Fri, May 17, 1:26 PM
nahimilega updated subscribers of D1482: GNU Lister.

Related T1351

Fri, May 17, 12:32 PM
Herald added a reviewer for D1482: GNU Lister: Reviewers.
Fri, May 17, 12:30 PM

Thu, May 16

nahimilega updated subscribers of T1351: (periodically) ingest GNU package releases.

As suggested by @olasd, what was done in 2015 to ingest packages -

  1. Create origins for all the folders indiscriminately
  2. Only import things that look like tarballs (i.e. that end with .tar.something)
Thu, May 16, 6:12 PM · Archive coverage
nahimilega added a comment to T1718: Implement a NuGet(.NET) lister.

@olasd recommended trying the listing approach for NuGET lister we discussed(to fetch for repository key in the api response), As recommended, I tried the approach on small dataset. I tried it on 1412 repositories are all of them were quite latest. I found 0 repository URL in them and in 900 of them repository key was empty(ie they were blank string). I think we need to change our approach.

Thu, May 16, 4:32 PM · GSoC 2019, Archive coverage
nahimilega updated subscribers of T1718: Implement a NuGet(.NET) lister.
Thu, May 16, 12:39 PM · GSoC 2019, Archive coverage
nahimilega added a comment to T1718: Implement a NuGet(.NET) lister.

As discussed on IRC the source code link for the repository is in very few of the repositories and the version control system used by repositories is not mentioned in the API response.
One way is Repository URL and the repository type field are present in .nuspec file for each project, so we have to download that file for each project and get source URL but the problem with this is downloading all binary packages to get a small chance to find a link to a source repository sounds like a lot of work, bandwidth and computing power for not much gain and that would only cover one of the ways package maintainers can set the source code information; the aforementioned blog post listed at least four

Thu, May 16, 12:38 PM · GSoC 2019, Archive coverage
nahimilega added a comment to D1365: Implemented a lister for phabricator instance.

Thanks, @anlambert, for your help and guidance. As it was my first lister, I would have never been able to complete it without your help. You review assisted me in making this lister more robust and also helping me understand the basics of Lister.
Once again, thanks for your patience and guidance.

Thu, May 16, 7:23 AM

Wed, May 15

nahimilega committed rDLSfedfd73c8e4b: swh.lister.phabricator (authored by nahimilega).
swh.lister.phabricator
Wed, May 15, 4:42 PM
nahimilega closed T808: phabricator lister, a subtask of T807: dogfooding: ingest the Software Heritage forge into the archive (via the canonical URLs), as Resolved.
Wed, May 15, 4:42 PM · General
nahimilega closed D1365: Implemented a lister for phabricator instance.
Wed, May 15, 4:42 PM
nahimilega closed T808: phabricator lister as Resolved by committing rDLSfedfd73c8e4b: swh.lister.phabricator.
Wed, May 15, 4:42 PM · Easy hack, Phabricator forge
nahimilega added a comment to D1365: Implemented a lister for phabricator instance.

@anlambert As you mentioned in your previous comment, to remove None from the list I have added the function filter_before_inject() in the lister as you recommended to do.

Wed, May 15, 4:30 PM
nahimilega updated the diff for D1365: Implemented a lister for phabricator instance.

made all the changes recommended

Wed, May 15, 4:27 PM
nahimilega updated the diff for D1365: Implemented a lister for phabricator instance.

Removed None from the final list

Wed, May 15, 2:48 PM
nahimilega added a comment to T1718: Implement a NuGet(.NET) lister.

API Documentation -
https://docs.microsoft.com/en-us/nuget/api/catalog-resource#base-url

Wed, May 15, 1:34 PM · GSoC 2019, Archive coverage
nahimilega triaged T1718: Implement a NuGet(.NET) lister as Normal priority.
Wed, May 15, 1:31 PM · GSoC 2019, Archive coverage
nahimilega added a comment to T1709: implement an R-cran lister.

@olasd I do not have any familiarity with R language. Learning some basics and making this script would take me around a week. I was wondering it is possible that someone in Software Heritage who have some experience with R can write this script as it would be a matter of minutes to the person who knows R.
Is it possible to do so?

Wed, May 15, 11:33 AM · GSoC 2019, Archive coverage

Tue, May 14

nahimilega updated the diff for D1365: Implemented a lister for phabricator instance.

squash commits

Tue, May 14, 8:49 PM
nahimilega updated the diff for D1365: Implemented a lister for phabricator instance.

squash commits

Tue, May 14, 8:40 PM
nahimilega updated the diff for D1365: Implemented a lister for phabricator instance.
  • Updated README according to new standard
Tue, May 14, 2:16 PM

Mon, May 13

nahimilega added a comment to T1709: implement an R-cran lister.

Here is an implementation plan for making R-CRAN lister.
I have taken inspiration from the pypi lister.
To make lister.py for R-CRAN, we need to inherit SimpleLister class and override ingest_data() function and change its first line (where safely_issue_request() is called) to call the function which would run R script to return a json response.
Then after that it is quite like any normal response, we just need to implement following function list_packages, compute url, get_model_from_repo, task_dict and transport_response_simplified.

Mon, May 13, 9:36 PM · GSoC 2019, Archive coverage
nahimilega updated the diff for D1365: Implemented a lister for phabricator instance.
  • Updated testcase in phabricator lister
Mon, May 13, 6:18 PM
nahimilega added inline comments to D1461: Fix outdated README for listers and improve formatting.
Mon, May 13, 3:30 PM
nahimilega updated subscribers of T1709: implement an R-cran lister.

@faux on IRC mentioned that there is a public DB dump (https://cran.r-project.org/web/dbs) which might be helpful for the purpose.
This DB dump contains files with .rds extension which is used by R language. Here are a couple of rows from that DB dump https://forge.softwareheritage.org/P396

Mon, May 13, 1:56 PM · GSoC 2019, Archive coverage
nahimilega triaged T1709: implement an R-cran lister as Normal priority.
Mon, May 13, 1:48 PM · GSoC 2019, Archive coverage
nahimilega updated the diff for D1365: Implemented a lister for phabricator instance.
  • Fixed a typo in phabricator lister
Mon, May 13, 12:07 PM
nahimilega updated the diff for D1365: Implemented a lister for phabricator instance.
  • Made phabricator lister robust
Mon, May 13, 12:01 PM

Sun, May 12

nahimilega added a comment to D1365: Implemented a lister for phabricator instance.

@anlambert As you recommended I tested the lister on multiple forges. Some of the repos where it failed to fetch URL are -

Sun, May 12, 3:25 PM
nahimilega updated the diff for D1365: Implemented a lister for phabricator instance.
  • Updated cli and made phabricator more robust
Sun, May 12, 12:43 PM

Fri, May 10

nahimilega created P396 cran_info_db.rds data in the S1 Public space.
Fri, May 10, 7:55 PM
nahimilega added a comment to D1365: Implemented a lister for phabricator instance.

@anlambert I will test my lister on various phabricator instance and fix the bug. Also, I update the readme according to your recommendation. Thanks for your feedback.

Fri, May 10, 6:28 PM
nahimilega updated the diff for D1365: Implemented a lister for phabricator instance.
  • Fixed a typo
Fri, May 10, 2:50 PM
nahimilega updated the diff for D1365: Implemented a lister for phabricator instance.
  • Updated cli and readme for phabricator lister
Fri, May 10, 2:05 PM

Thu, May 9

nahimilega updated the diff for D1365: Implemented a lister for phabricator instance.
  • Made test cases for priority url selector
Thu, May 9, 8:24 PM

Tue, May 7

nahimilega updated the diff for D1365: Implemented a lister for phabricator instance.
  • Made improvements in code quality
Tue, May 7, 10:40 PM
nahimilega added a comment to D1365: Implemented a lister for phabricator instance.

Thanks, @vlorentz and @anlambert. I will implement these changes and submit the diff ASAP.
I have one more doubt. I do not have much experience in writing test cases. Can you please advice me or recommend me some source where I can refer to(maybe some lister which is already implemented), to get the idea on writing test cases to validate the repository URL extraction approach.

Tue, May 7, 6:26 PM
nahimilega added inline comments to D1365: Implemented a lister for phabricator instance.
Tue, May 7, 4:58 PM
nahimilega updated the diff for D1365: Implemented a lister for phabricator instance.
  • Added priority url selector in phabricator lister
Tue, May 7, 10:57 AM

Sat, May 4

nahimilega added a comment to D1365: Implemented a lister for phabricator instance.

@anlambert I have made all the recommended changes in the code. Could you please review it once?

Sat, May 4, 10:34 PM
nahimilega updated the diff for D1365: Implemented a lister for phabricator instance.
  • Made phabricator listor robust
Sat, May 4, 6:33 PM
nahimilega added a comment to D1365: Implemented a lister for phabricator instance.

Thanks, @anlambert for your reply. I am facing one more issue. How should I decide the priority order of the URIs. Is there any important point which I should keep into consideration while deciding the priority order?

Sat, May 4, 4:32 PM
nahimilega added a comment to D1365: Implemented a lister for phabricator instance.

@anlambert I am not able to understand the difference between raw uri, display uri, effective uri and normalized uri, can you please help me with this and explain a bit about the difference between these 4 types of uri?

Sat, May 4, 9:13 AM

Fri, May 3

nahimilega added a comment to D1365: Implemented a lister for phabricator instance.

Thank you @anlambert for your in-depth review. Thanks for testing the code. I will surely update my code according to your revision by earliest. Thanks again for such detailed comments. These help me a lot in solving the problem.

Fri, May 3, 6:27 PM
Herald added a reviewer for D1441: Added an important note in lister tutorial.: Reviewers.
Fri, May 3, 10:23 AM

Thu, May 2

nahimilega updated subscribers of D1365: Implemented a lister for phabricator instance.

@anlambert Can you please review the code and suggest improvements

Thu, May 2, 8:09 PM
nahimilega updated the diff for D1365: Implemented a lister for phabricator instance.
  • Fixed test task of phabricator lister
Thu, May 2, 6:31 PM
nahimilega edited P389 Console output of ping test.
Thu, May 2, 10:38 AM

Sun, Apr 28

Herald added a reviewer for D1437: Fixed typo in developer-setup: Reviewers.
Sun, Apr 28, 12:44 PM
nahimilega added a project to T1696: Spelling mistake in Developer Setup documentation: Development documentation.
Sun, Apr 28, 12:30 PM · Development documentation
nahimilega triaged T1696: Spelling mistake in Developer Setup documentation as Normal priority.
Sun, Apr 28, 12:29 PM · Development documentation
nahimilega updated the diff for D1365: Implemented a lister for phabricator instance.
  • Changed indexable value in phabricator lister
Sun, Apr 28, 12:54 AM
nahimilega created P389 Console output of ping test in the S1 Public space.
Sun, Apr 28, 12:02 AM

Apr 23 2019

nahimilega updated the diff for D1365: Implemented a lister for phabricator instance.
  • fixed conversion of datatype
Apr 23 2019, 7:48 PM
nahimilega updated the diff for D1365: Implemented a lister for phabricator instance.
  • Fixed null exception in shortName
Apr 23 2019, 7:42 PM
nahimilega updated the diff for D1365: Implemented a lister for phabricator instance.
  • Changed indexable and uid type to string
Apr 23 2019, 6:35 PM
nahimilega updated the diff for D1365: Implemented a lister for phabricator instance.
  • Changed bad response for phabricator lister
Apr 23 2019, 4:33 PM
nahimilega created P385 Output of docker-compose up swh-web in the S1 Public space.
Apr 23 2019, 3:08 PM
nahimilega created P384 Output of docker-compose ps in the S1 Public space.
Apr 23 2019, 2:47 PM

Apr 14 2019

nahimilega added a comment to T808: phabricator lister.

Hey @faux, I am still working on this.

Apr 14 2019, 2:04 PM · Easy hack, Phabricator forge