Page MenuHomeSoftware Heritage

nahimilega (Archit Agrawal)
User

Projects

User Details

User Since
Mar 10 2019, 8:07 PM (18 w, 2 d)

Recent Activity

Today

nahimilega updated the diff for D1694: swh.loader.base: Implement base package manager loader.
  • Increase the number of hookpoints
Wed, Jul 17, 12:13 AM
nahimilega added a comment to D1694: swh.loader.base: Implement base package manager loader.

There is a concern regarding base loader that its implementation style could invite the problems that are present in lister base code. Here is my take on that.
There are my thoughts on this topic. These are the points I keep in mind while making the base loader.

Wed, Jul 17, 12:06 AM

Yesterday

nahimilega added inline comments to D1733: swh.lister.core: Add tests for simple lister.
Tue, Jul 16, 6:06 PM
nahimilega updated the summary of D1733: swh.lister.core: Add tests for simple lister.
Tue, Jul 16, 6:03 PM
nahimilega retitled D1733: swh.lister.core: Add tests for simple lister from swh.lister.pypi: Add tests to swh.lister.core: Add tests for simple lister.
Tue, Jul 16, 6:02 PM
nahimilega updated the diff for D1733: swh.lister.core: Add tests for simple lister.
  • swh.lister.core: Add test for simple lister
  • swh.lister.pypi: Add tests
Tue, Jul 16, 6:02 PM

Mon, Jul 15

nahimilega added a comment to D1694: swh.loader.base: Implement base package manager loader.

Keep the stuff in swh/loader/core, not swh/loader/base.

I choose to keep in in a separate folder just for better handling. I mean there are many files in the core itself, and adding those will move it more cluttered. Can I do this in the end, before landing as it, at it is difficult for me to work on base loader when everything is in a common folder(ie core)?

Mon, Jul 15, 4:05 PM
nahimilega added a comment to D1584: swh.lister.packagist.

There is a need for variables like test_re (Compiled regex matching the server url. Must capture the index value), whereas these listers (packagist and pypi) do not have an index value, they just make a request to one page and parse over that request to get the info, hence I didn't use the conventional method of testing (method using HttpListerTester class)

Mon, Jul 15, 12:03 PM

Sun, Jul 14

nahimilega updated subscribers of D1733: swh.lister.core: Add tests for simple lister.
Sun, Jul 14, 8:32 PM
nahimilega added a revision to T1890: pypi lister: Add tests: D1733: swh.lister.core: Add tests for simple lister.
Sun, Jul 14, 8:28 PM · Origin-Pypi, Lister
Herald added a reviewer for D1733: swh.lister.core: Add tests for simple lister: Reviewers.
Sun, Jul 14, 8:28 PM
nahimilega updated the diff for D1584: swh.lister.packagist.

Add header in test files.

Sun, Jul 14, 4:59 PM
nahimilega updated the diff for D1584: swh.lister.packagist.

Add tests for lister.py

Sun, Jul 14, 4:44 PM
nahimilega created P475 docker compose in the S1 Public space.
Sun, Jul 14, 1:30 PM
nahimilega edited P473 swh storage logs.
Sun, Jul 14, 1:24 PM
nahimilega edited P474 kafka logs.
Sun, Jul 14, 1:24 PM
nahimilega created P474 kafka logs in the S1 Public space.
Sun, Jul 14, 1:23 PM
nahimilega created P473 swh storage logs in the S1 Public space.
Sun, Jul 14, 1:21 PM
nahimilega created P472 origin not able to load in the S1 Public space.
Sun, Jul 14, 12:55 PM
nahimilega updated the diff for D1694: swh.loader.base: Implement base package manager loader.
  • swh.loader.base: Complete snapshot building process

The base loader is complete but there are small problems I am facing in few steps

  • How to get branch name
  • How to find the HEAD for the branch, and there could be some package manager where it is not passible to find head because of lack of metadata, how to deal with those cases
Sun, Jul 14, 11:17 AM

Sat, Jul 13

nahimilega retitled D1694: swh.loader.base: Implement base package manager loader from swh.loader.core: Implement base package manager loader to swh.loader.base: Implement base package manager loader.
Sat, Jul 13, 9:02 PM
nahimilega created P471 error with base loader in the S1 Public space.
Sat, Jul 13, 9:00 PM
nahimilega added a comment to D1694: swh.loader.base: Implement base package manager loader.

I run GNU loader(after making some, slight modifications), it ran successfully (status was full).

Sat, Jul 13, 7:32 PM

Fri, Jul 12

nahimilega updated the diff for D1694: swh.loader.base: Implement base package manager loader.
  • Make generate_snapshot() method
Fri, Jul 12, 7:19 PM
nahimilega added inline comments to D1694: swh.loader.base: Implement base package manager loader.
Fri, Jul 12, 3:32 PM
nahimilega updated the diff for D1694: swh.loader.base: Implement base package manager loader.
  • swh.loader.base: Improve docstrings
Fri, Jul 12, 3:31 PM
nahimilega added a comment to D1694: swh.loader.base: Implement base package manager loader.

The base loader is almost complete with just a few more methods to implement.

Fri, Jul 12, 9:56 AM
nahimilega added inline comments to D1694: swh.loader.base: Implement base package manager loader.
Fri, Jul 12, 9:52 AM
nahimilega updated the diff for D1694: swh.loader.base: Implement base package manager loader.
  • Fix docstring and make construct_revision class
Fri, Jul 12, 12:52 AM

Thu, Jul 11

nahimilega updated subscribers of D1729: swh.loader.gnu: Implement gnu loader.
Thu, Jul 11, 6:38 PM
nahimilega updated subscribers of D1728: swh.loader.cran: Implement CRAN loader.
Thu, Jul 11, 6:37 PM
nahimilega added a comment to D1694: swh.loader.base: Implement base package manager loader.

D1729 and D1728 are the proposed implementation of GNU and CRAN Loader using the base loader

Thu, Jul 11, 6:15 PM
Herald added a reviewer for D1729: swh.loader.gnu: Implement gnu loader: Reviewers.
Thu, Jul 11, 6:13 PM
nahimilega added a comment to D1728: swh.loader.cran: Implement CRAN loader.

This is the proposed implementation of CRAN Loader using base loader(D1694)

Thu, Jul 11, 6:03 PM
Herald added a reviewer for D1728: swh.loader.cran: Implement CRAN loader: Reviewers.
Thu, Jul 11, 6:00 PM
nahimilega created P470 swh-storage package list in the S1 Public space.
Thu, Jul 11, 5:09 PM
nahimilega edited P469 pip list.
Thu, Jul 11, 5:07 PM
nahimilega created P469 pip list in the S1 Public space.
Thu, Jul 11, 5:05 PM
nahimilega edited P468 error while running pypi loader.
Thu, Jul 11, 3:28 PM
nahimilega created P468 error while running pypi loader in the S1 Public space.
Thu, Jul 11, 3:21 PM
nahimilega updated the diff for D1694: swh.loader.base: Implement base package manager loader.

Complete prepare method and download.py
Note: Many docstrings are out of sync, they don't tell what the method is really doing

Thu, Jul 11, 1:21 PM

Wed, Jul 10

nahimilega created P465 sphinx error in the S1 Public space.
Wed, Jul 10, 10:58 PM
nahimilega created P464 error while installation in the S1 Public space.
Wed, Jul 10, 9:08 PM

Mon, Jul 8

nahimilega added a comment to T1734: Create a Lister for launchpad.net.

P.S. If you want us to provide an English translation of the diagrams and their corresponding explanation, please feel free to notify us.

Mon, Jul 8, 12:10 PM · Archive coverage
nahimilega added a comment to D1584: swh.lister.packagist.

I'm sorry, I did not follow.

According to the previous approach, it first finds names of all the packages and then visits their metadata url to get the tarball url of all the release.
eg for a package name 'monolog/monolog'
Its metadata page(https://repo.packagist.org/p/monolog/monolog.json) will give tarball url of all the release.
Something like this

{
   "release": [
       {
       "vcs": "zip",
       "url": "https://api.github.com/repos/Seldaek/monolog/zipball/
               433b98d4218c181bae01865901aac045585e8a1a",
       "description": "Logging for PHP 5.3"
       },
       {
       "vcs": "zip",
       "url": "https://api.github.com/repos/Seldaek/monolog/zipball/
               5e651a82b4b03d267da6084720ada0cd398c8d16",
       "description": "Logging for PHP 5.3"
       },
       ...
       ]
   "source": [
       {
       "vcs": "git",
       "url": "https://github.com/Seldaek/monolog.git",
       "description": "Logging for PHP 5.3"
       },
       ...
       ]

Here, as you can see the API response also provide upstream repository under the key source. To utilize this, in the previous approach, lister was creating a task of respective vcs loader(git loader in this example) and also created a packagist loader task to ingest tarballs provided.

Mon, Jul 8, 11:52 AM

Sun, Jul 7

nahimilega updated subscribers of D1694: swh.loader.base: Implement base package manager loader.
Sun, Jul 7, 11:58 PM
nahimilega updated the diff for D1694: swh.loader.base: Implement base package manager loader.

This diff is in nacent stage, the methods that have # Done as a comment
are completed. Rest of them are stil in the process. There are also some rogue
comments present, thos are just for my hemp during development and will remove them
once the code is complete.

Sun, Jul 7, 11:56 PM
Herald added a reviewer for D1694: swh.loader.base: Implement base package manager loader: Reviewers.
Sun, Jul 7, 10:10 PM
nahimilega added a revision to T1389: Implement a base loader for package managers: D1694: swh.loader.base: Implement base package manager loader.
Sun, Jul 7, 10:10 PM · Origin-npm, Origin-Pypi, Archive coverage
nahimilega added a comment to D1584: swh.lister.packagist.

One thing on I am not sure about regarding this approach :
The description page also provides the upstream link of the package along with the link to tarballs. In the previous approach, a separate loader task was created for those upstream link to utilise all the information provided and further increase the archive coverage.

Sun, Jul 7, 9:38 PM
nahimilega added a comment to D1584: swh.lister.packagist.

it's missing some tests

Sun, Jul 7, 9:29 PM
nahimilega updated the diff for D1584: swh.lister.packagist.
  • Rebase on latest master
  • Remove safety_issue_request() method
  • Check for out of sync docstrings
Sun, Jul 7, 9:14 PM

Fri, Jul 5

nahimilega updated the diff for D1584: swh.lister.packagist.

Remove page visit step

Fri, Jul 5, 2:08 PM

Tue, Jul 2

nahimilega added a comment to D1584: swh.lister.packagist.

I suppose you meant this lists only the name....

Ya, sorry for the typo

Tue, Jul 2, 6:05 PM
nahimilega added a comment to D1584: swh.lister.packagist.

Currently, this lister visits the page of each and every package to get all the versions of a particular package. This approach is really slow, for me it took 2 hours to list 6% of the total package.
One thing that could be done is, this lister only the name of the package and the URL of metadata for a package and then loader visits that URL to get all the versions(like done in pypi lister). This approach will take only a couple of seconds and reduce the code for the lister.

Tue, Jul 2, 5:45 PM
nahimilega added a comment to T1389: Implement a base loader for package managers.

Extending over the plan by @olasd

Tue, Jul 2, 12:46 PM · Origin-npm, Origin-Pypi, Archive coverage

Fri, Jun 28

nahimilega changed the status of T1835: List/Ingest major cgit instances from Work in Progress to Open.
Fri, Jun 28, 7:36 PM · Lister
nahimilega updated the task description for T1835: List/Ingest major cgit instances.
Fri, Jun 28, 7:22 PM · Lister
nahimilega added a comment to T1856: Adapt contribution policy?.

My two cents
I think landing by reviewer is a good idea, because when the diff is passed(accepted to land), author can still make changes in it and push it, and that changes might not be good one.
This could be a potential threat to code quality and other factors.
On the other hand, landing by reviewer would add one more task in their plate.

Fri, Jun 28, 6:46 PM · Staff
nahimilega committed rDLS7e3c79bb1d18: swh.lister.cgit: Add pagination support (authored by nahimilega).
swh.lister.cgit: Add pagination support
Fri, Jun 28, 5:22 PM
nahimilega committed rDLS0bf24469b7e0: swh.lister.cgit: Remove repo page visit step (authored by nahimilega).
swh.lister.cgit: Remove repo page visit step
Fri, Jun 28, 5:22 PM
nahimilega committed rDLSb972a2a88d25: swh.lister.cgit (authored by nahimilega).
swh.lister.cgit
Fri, Jun 28, 5:22 PM
nahimilega closed T1659: rewrite the CGit lister as a proper lister, a subtask of T1451: ingest GNU Savannah Git repositories, as Resolved.
Fri, Jun 28, 5:22 PM · Archive coverage
nahimilega closed T1659: rewrite the CGit lister as a proper lister, a subtask of T1799: ingest Tor git repositories, as Resolved.
Fri, Jun 28, 5:22 PM · Archive coverage
nahimilega closed T1659: rewrite the CGit lister as a proper lister as Resolved by committing rDLSb972a2a88d25: swh.lister.cgit.
Fri, Jun 28, 5:22 PM · CGit lister
nahimilega closed D1610: swh.lister.cgit.
Fri, Jun 28, 5:22 PM
nahimilega updated the diff for D1610: swh.lister.cgit.
  • add methord to avoid list wrapping
Fri, Jun 28, 4:36 PM
nahimilega updated the diff for D1610: swh.lister.cgit.
  • rebase on latest master
Fri, Jun 28, 3:58 PM
nahimilega added a comment to D1610: swh.lister.cgit.

@nahimilega , I will commit the fix. You will just have to rebase your diff before landing.

Fri, Jun 28, 3:50 PM
nahimilega updated the diff for D1610: swh.lister.cgit.
  • swh.lister.core: Increase flush frequency in simple lister
Fri, Jun 28, 3:48 PM
nahimilega added inline comments to D1610: swh.lister.cgit.
Fri, Jun 28, 3:35 PM
nahimilega added inline comments to D1610: swh.lister.cgit.
Fri, Jun 28, 2:53 PM
nahimilega added inline comments to D1610: swh.lister.cgit.
Fri, Jun 28, 2:49 PM
nahimilega updated the diff for D1610: swh.lister.cgit.
  • Made recommended changes
Fri, Jun 28, 2:48 PM

Thu, Jun 27

nahimilega committed rDLS5ea9d5ed392a: swh.lister.cran: Add description in task_dict (authored by nahimilega).
swh.lister.cran: Add description in task_dict
Thu, Jun 27, 12:49 PM
nahimilega closed D1646: swh.lister.cran: Add description in task_dict.
Thu, Jun 27, 12:49 PM
nahimilega updated the diff for D1646: swh.lister.cran: Add description in task_dict.

Shifted the test to another file.

Thu, Jun 27, 11:28 AM
nahimilega updated the diff for D1646: swh.lister.cran: Add description in task_dict.

Add test to avoid removal of description in future

Thu, Jun 27, 11:08 AM

Wed, Jun 26

nahimilega updated the diff for D1610: swh.lister.cgit.
  • Add testcases and changed variable base_url to url and origin_url_prefix to url_prefix
Wed, Jun 26, 9:07 PM
nahimilega added a comment to D1646: swh.lister.cran: Add description in task_dict.

Remove the argument of the test

Wed, Jun 26, 6:38 PM
nahimilega added a comment to D1646: swh.lister.cran: Add description in task_dict.

Got it, it's because you added that check in a test where swh.lister.cran.tasks.CRANLister is mocked, so when you do lister.run() nothing actually happens. (You can check, even lister.thisIsNotARealFunction() wouldn't crash)

Wed, Jun 26, 6:29 PM
nahimilega added a comment to D1646: swh.lister.cran: Add description in task_dict.

It means swh.scheduler.utils.create_task_dict was never called.
This is because the lister imports from swh.scheduler.utils import create_task_dict before it was patched. Instead, you should patch swh.lister.cran.lister.create_task_dict.

Wed, Jun 26, 5:11 PM
nahimilega added a comment to D1646: swh.lister.cran: Add description in task_dict.

What is the error message?

Wed, Jun 26, 4:48 PM
nahimilega added a comment to D1646: swh.lister.cran: Add description in task_dict.

No, when you do this, mock_create_tasks.assert_called_once_with() checks it was called a single time, and that single time is mock_create_tasks(). So what you are actually testing is that swh.scheduler.utils.create_task_dict is never called by the lister itself.

Wed, Jun 26, 4:34 PM
nahimilega added inline comments to D1646: swh.lister.cran: Add description in task_dict.
Wed, Jun 26, 4:16 PM
nahimilega updated subscribers of D1646: swh.lister.cran: Add description in task_dict.
Wed, Jun 26, 4:13 PM
nahimilega updated subscribers of D1646: swh.lister.cran: Add description in task_dict.
Wed, Jun 26, 4:12 PM
Herald added a reviewer for D1646: swh.lister.cran: Add description in task_dict: Reviewers.
Wed, Jun 26, 4:08 PM
nahimilega added inline comments to D1644: Finish dropping the 'description' column..
Wed, Jun 26, 2:56 PM
nahimilega added inline comments to D1644: Finish dropping the 'description' column..
Wed, Jun 26, 2:52 PM
nahimilega added inline comments to D1644: Finish dropping the 'description' column..
Wed, Jun 26, 2:46 PM
nahimilega added a comment to D1610: swh.lister.cgit.

Ok, so for those, the idea of having patterns to fill in seems simple.
We could have the following pattern to initialize on an instance basic
(init):
<url>/<main-pattern><sub-pattern>
With this, we could simplify the other instances.
Without failing to deal with the first one which are already ok.

Wed, Jun 26, 2:17 PM
nahimilega added inline comments to D1610: swh.lister.cgit.
Wed, Jun 26, 2:07 PM
nahimilega added inline comments to D1610: swh.lister.cgit.
Wed, Jun 26, 2:03 PM
nahimilega added inline comments to D1610: swh.lister.cgit.
Wed, Jun 26, 1:58 PM
nahimilega updated the diff for D1610: swh.lister.cgit.
  • swh.lister.cgit: Remove repo page visit step
Wed, Jun 26, 1:58 PM
nahimilega updated the diff for D1610: swh.lister.cgit.
  • swh.lister.cgit: Remove repo page visit step
Wed, Jun 26, 1:49 PM

Mon, Jun 24

nahimilega added a comment to D1610: swh.lister.cgit.

Well, as a first approximation, i'd say let's go towards removing the visit step anyway (and compute basic git clone url, self.get_url sounds good enough IIRC).

self.get_url finds the url of the repo page
This method could work for https://git.kernel.org/ and http://git.savannah.gnu.org/cgit/ (maybe some more) but

Mon, Jun 24, 3:22 PM
nahimilega added inline comments to D1610: swh.lister.cgit.
Mon, Jun 24, 1:28 PM
nahimilega added inline comments to D1610: swh.lister.cgit.
Mon, Jun 24, 1:25 PM