There is a concern regarding base loader that its implementation style could invite the problems that are present in lister base code. Here is my take on that.
There are my thoughts on this topic. These are the points I keep in mind while making the base loader.

Jul 17 2019, 12:06 AM

Jul 16 2019

nahimilega added inline comments to D1733: swh.lister.core: Add tests for simple lister.

Jul 16 2019, 6:06 PM

nahimilega updated the summary of D1733: swh.lister.core: Add tests for simple lister.

Jul 16 2019, 6:03 PM

nahimilega retitled D1733: swh.lister.core: Add tests for simple lister from swh.lister.pypi: Add tests to swh.lister.core: Add tests for simple lister.

Jul 16 2019, 6:02 PM

nahimilega updated the diff for D1733: swh.lister.core: Add tests for simple lister.

swh.lister.core: Add test for simple lister
swh.lister.pypi: Add tests

Jul 16 2019, 6:02 PM

Jul 15 2019

nahimilega added a comment to D1694: swh.loader.package: Implement a method to prepare package visit.

Keep the stuff in swh/loader/core, not swh/loader/base.

I choose to keep in in a separate folder just for better handling. I mean there are many files in the core itself, and adding those will move it more cluttered. Can I do this in the end, before landing as it, at it is difficult for me to work on base loader when everything is in a common folder(ie core)?

Jul 15 2019, 4:05 PM

nahimilega added a comment to D1584: swh.lister.packagist.

There is a need for variables like test_re (Compiled regex matching the server url. Must capture the index value), whereas these listers (packagist and pypi) do not have an index value, they just make a request to one page and parse over that request to get the info, hence I didn't use the conventional method of testing (method using HttpListerTester class)
(Same implies for pypi)

Jul 15 2019, 12:03 PM

Jul 14 2019

nahimilega updated subscribers of D1733: swh.lister.core: Add tests for simple lister.

Jul 14 2019, 8:32 PM

nahimilega added a revision to T1890: pypi lister: Add tests: D1733: swh.lister.core: Add tests for simple lister.

Jul 14 2019, 8:28 PM · Origin-Pypi, Lister

Herald added a reviewer for D1733: swh.lister.core: Add tests for simple lister: Reviewers.

Jul 14 2019, 8:28 PM

nahimilega updated the diff for D1584: swh.lister.packagist.

Add header in test files.

Jul 14 2019, 4:59 PM

nahimilega updated the diff for D1584: swh.lister.packagist.

Add tests for lister.py
rebase on latest master

Jul 14 2019, 4:44 PM

nahimilega created P475 docker compose in the S1 Public space.

Jul 14 2019, 1:30 PM

nahimilega edited P473 swh storage logs.

Jul 14 2019, 1:24 PM

nahimilega edited P474 kafka logs.

Jul 14 2019, 1:24 PM

nahimilega created P474 kafka logs in the S1 Public space.

Jul 14 2019, 1:23 PM

nahimilega created P473 swh storage logs in the S1 Public space.

Jul 14 2019, 1:21 PM

nahimilega created P472 origin not able to load in the S1 Public space.

Jul 14 2019, 12:55 PM

nahimilega updated the diff for D1694: swh.loader.package: Implement a method to prepare package visit.

swh.loader.base: The complete snapshot building process

The base loader is complete but there are small problems I am facing in a few steps

How to get the branch name
How to find the HEAD for the branch, and there could be some package manager where it is not possible to find head because of lack of metadata, how to deal with those cases

Jul 14 2019, 11:17 AM

Jul 13 2019

nahimilega retitled D1694: swh.loader.package: Implement a method to prepare package visit from swh.loader.core: Implement base package manager loader to swh.loader.base: Implement base package manager loader.

Jul 13 2019, 9:02 PM

nahimilega created P471 error with base loader in the S1 Public space.

Jul 13 2019, 9:00 PM

nahimilega added a comment to D1694: swh.loader.package: Implement a method to prepare package visit.

I run GNU loader(after making some, slight modifications), it ran successfully (status was full).

Jul 13 2019, 7:32 PM

Jul 12 2019

nahimilega updated the diff for D1694: swh.loader.package: Implement a method to prepare package visit.

Make generate_snapshot() method

Jul 12 2019, 7:19 PM

nahimilega added inline comments to D1694: swh.loader.package: Implement a method to prepare package visit.

Jul 12 2019, 3:32 PM

nahimilega updated the diff for D1694: swh.loader.package: Implement a method to prepare package visit.

swh.loader.base: Improve docstrings

Jul 12 2019, 3:31 PM

nahimilega added a comment to D1694: swh.loader.package: Implement a method to prepare package visit.

The base loader is almost complete with just a few more methods to implement.

Jul 12 2019, 9:56 AM

nahimilega added inline comments to D1694: swh.loader.package: Implement a method to prepare package visit.

Jul 12 2019, 9:52 AM

nahimilega updated the diff for D1694: swh.loader.package: Implement a method to prepare package visit.

Fix docstring and make construct_revision class

Jul 12 2019, 12:52 AM

Jul 11 2019

nahimilega updated subscribers of D1729: swh.loader.gnu: Implement gnu loader.

Jul 11 2019, 6:38 PM

nahimilega updated subscribers of D1728: swh.loader.cran: Implement CRAN loader.

Jul 11 2019, 6:37 PM

nahimilega added a comment to D1694: swh.loader.package: Implement a method to prepare package visit.

D1729 and D1728 are the proposed implementation of GNU and CRAN Loader using the base loader

Jul 11 2019, 6:15 PM

Herald added a reviewer for D1729: swh.loader.gnu: Implement gnu loader: Reviewers.

Jul 11 2019, 6:13 PM

nahimilega added a comment to D1728: swh.loader.cran: Implement CRAN loader.

This is the proposed implementation of CRAN Loader using base loader(D1694)
This implementation requires some small refactorings in cran lister

Jul 11 2019, 6:03 PM

Herald added a reviewer for D1728: swh.loader.cran: Implement CRAN loader: Reviewers.

Jul 11 2019, 6:00 PM

nahimilega created P470 swh-storage package list in the S1 Public space.

Jul 11 2019, 5:09 PM

nahimilega edited P469 pip list.

Jul 11 2019, 5:07 PM

nahimilega created P469 pip list in the S1 Public space.

Jul 11 2019, 5:05 PM

nahimilega edited P468 error while running pypi loader.

Jul 11 2019, 3:28 PM

nahimilega created P468 error while running pypi loader in the S1 Public space.

Jul 11 2019, 3:21 PM

nahimilega updated the diff for D1694: swh.loader.package: Implement a method to prepare package visit.

Complete prepare method and download.py
Note: Many docstrings are out of sync, they don't tell what the method is really doing

Jul 11 2019, 1:21 PM

Jul 10 2019

nahimilega created P465 sphinx error in the S1 Public space.

Jul 10 2019, 10:58 PM

nahimilega created P464 error while installation in the S1 Public space.

Jul 10 2019, 9:08 PM

Jul 8 2019

nahimilega added a comment to T1734: Create a Lister for launchpad.net.

P.S. If you want us to provide an English translation of the diagrams and their corresponding explanation, please feel free to notify us.

Jul 8 2019, 12:10 PM · Lister, Archive coverage

nahimilega added a comment to D1584: swh.lister.packagist.

I'm sorry, I did not follow.

According to the previous approach, it first finds names of all the packages and then visits their metadata url to get the tarball url of all the release.
eg for a package name 'monolog/monolog'
Its metadata page(https://repo.packagist.org/p/monolog/monolog.json) will give tarball url of all the release.
Something like this

{
   "release": [
       {
       "vcs": "zip",
       "url": "https://api.github.com/repos/Seldaek/monolog/zipball/
               433b98d4218c181bae01865901aac045585e8a1a",
       "description": "Logging for PHP 5.3"
       },
       {
       "vcs": "zip",
       "url": "https://api.github.com/repos/Seldaek/monolog/zipball/
               5e651a82b4b03d267da6084720ada0cd398c8d16",
       "description": "Logging for PHP 5.3"
       },
       ...
       ]
   "source": [
       {
       "vcs": "git",
       "url": "https://github.com/Seldaek/monolog.git",
       "description": "Logging for PHP 5.3"
       },
       ...
       ]

Here, as you can see the API response also provide upstream repository under the key source. To utilize this, in the previous approach, lister was creating a task of respective vcs loader(git loader in this example) and also created a packagist loader task to ingest tarballs provided.

Jul 8 2019, 11:52 AM

Jul 7 2019

nahimilega updated subscribers of D1694: swh.loader.package: Implement a method to prepare package visit.

Jul 7 2019, 11:58 PM

nahimilega updated the diff for D1694: swh.loader.package: Implement a method to prepare package visit.

This diff is in its nascent stage, the methods that have # Done as a comment
are completed. Rest of them are still in the process. There are also some rogue
comments present, those are just for my help during development and will remove them
once the code is complete.

Jul 7 2019, 11:56 PM

Herald added a reviewer for D1694: swh.loader.package: Implement a method to prepare package visit: Reviewers.

Jul 7 2019, 10:10 PM

nahimilega added a revision to T1389: Implement a base "package" loader for package managers: D1694: swh.loader.package: Implement a method to prepare package visit.

Jul 7 2019, 10:10 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage

nahimilega added a comment to D1584: swh.lister.packagist.

One thing on I am not sure about regarding this approach :
The description page also provides the upstream link of the package along with the link to tarballs. In the previous approach, a separate loader task was created for those upstream link to utilise all the information provided and further increase the archive coverage.

Jul 7 2019, 9:38 PM

nahimilega added a comment to D1584: swh.lister.packagist.

it's missing some tests

Jul 7 2019, 9:29 PM

nahimilega updated the diff for D1584: swh.lister.packagist.

Rebase on latest master
Remove safety_issue_request() method
Check for out of sync docstrings
Update git commit message according to new approach

Jul 7 2019, 9:14 PM

Jul 5 2019

nahimilega updated the diff for D1584: swh.lister.packagist.

Remove page visit step

Jul 5 2019, 2:08 PM

Jul 2 2019

nahimilega added a comment to D1584: swh.lister.packagist.

I suppose you meant this lists only the name....

Ya, sorry for the typo

Jul 2 2019, 6:05 PM

nahimilega added a comment to D1584: swh.lister.packagist.

Currently, this lister visits the page of each and every package to get all the versions of a particular package. This approach is really slow, for me it took 2 hours to list 6% of the total package.
One thing that could be done is, this lister only the name of the package and the URL of metadata for a package and then loader visits that URL to get all the versions(like done in pypi lister). This approach will take only a couple of seconds and reduce the code for the lister.

Jul 2 2019, 5:45 PM

nahimilega added a comment to T1389: Implement a base "package" loader for package managers.

Extending over the plan by @olasd. Here are some of my thoughts on the implementation of base loader.

Jul 2 2019, 12:46 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage

Jun 28 2019

nahimilega changed the status of T1835: List/Ingest major cgit instances from Work in Progress to Open.

Jun 28 2019, 7:36 PM · Lister

nahimilega updated the task description for T1835: List/Ingest major cgit instances.

Jun 28 2019, 7:22 PM · Lister

nahimilega added a comment to T1856: Adapt contribution policy?.

My two cents
I think landing by reviewer is a good idea, because when the diff is passed(accepted to land), author can still make changes in it and push it, and that changes might not be good one.
This could be a potential threat to code quality and other factors.
On the other hand, landing by reviewer would add one more task in their plate.

Jun 28 2019, 6:46 PM · Staff

nahimilega committed rDLS7e3c79bb1d18: swh.lister.cgit: Add pagination support (authored by nahimilega).

swh.lister.cgit: Add pagination support

Jun 28 2019, 5:22 PM

nahimilega committed rDLS0bf24469b7e0: swh.lister.cgit: Remove repo page visit step (authored by nahimilega).

swh.lister.cgit: Remove repo page visit step

Jun 28 2019, 5:22 PM

nahimilega committed rDLSb972a2a88d25: swh.lister.cgit (authored by nahimilega).

swh.lister.cgit

Jun 28 2019, 5:22 PM

nahimilega closed T1659: rewrite the CGit lister as a proper lister, a subtask of T1451: ingest GNU Savannah Git repositories, as Resolved.

Jun 28 2019, 5:22 PM · Archive coverage

nahimilega closed T1659: rewrite the CGit lister as a proper lister, a subtask of T1799: ingest Tor git repositories, as Resolved.

Jun 28 2019, 5:22 PM · Archive coverage

nahimilega closed T1659: rewrite the CGit lister as a proper lister as Resolved by committing rDLSb972a2a88d25: swh.lister.cgit.

Jun 28 2019, 5:22 PM · CGit lister

nahimilega closed D1610: swh.lister.cgit.

Jun 28 2019, 5:22 PM

nahimilega updated the diff for D1610: swh.lister.cgit.

add method to avoid list wrapping

Jun 28 2019, 4:36 PM

nahimilega updated the diff for D1610: swh.lister.cgit.

rebase on latest master

Jun 28 2019, 3:58 PM

nahimilega added a comment to D1610: swh.lister.cgit.

@nahimilega , I will commit the fix. You will just have to rebase your diff before landing.

Jun 28 2019, 3:50 PM

nahimilega updated the diff for D1610: swh.lister.cgit.

swh.lister.core: Increase flush frequency in simple lister

Jun 28 2019, 3:48 PM

nahimilega added inline comments to D1610: swh.lister.cgit.

Jun 28 2019, 3:35 PM

nahimilega added inline comments to D1610: swh.lister.cgit.

Jun 28 2019, 2:53 PM

nahimilega added inline comments to D1610: swh.lister.cgit.

Jun 28 2019, 2:49 PM

nahimilega updated the diff for D1610: swh.lister.cgit.

Made recommended changes

Jun 28 2019, 2:48 PM

Jun 27 2019

nahimilega committed rDLS5ea9d5ed392a: swh.lister.cran: Add description in task_dict (authored by nahimilega).

swh.lister.cran: Add description in task_dict

Jun 27 2019, 12:49 PM

nahimilega closed D1646: swh.lister.cran: Add description in task_dict.

Jun 27 2019, 12:49 PM

nahimilega updated the diff for D1646: swh.lister.cran: Add description in task_dict.

Shifted the test to another file.

Jun 27 2019, 11:28 AM

nahimilega updated the diff for D1646: swh.lister.cran: Add description in task_dict.

Add test to avoid removal of description in future

Jun 27 2019, 11:08 AM

Jun 26 2019

nahimilega updated the diff for D1610: swh.lister.cgit.

Add testcases and changed variable base_url to url and origin_url_prefix to url_prefix
rebased on master

Jun 26 2019, 9:07 PM

nahimilega added a comment to D1646: swh.lister.cran: Add description in task_dict.

In D1646#37941, @vlorentz wrote:

Remove the argument of the test

Jun 26 2019, 6:38 PM

nahimilega added a comment to D1646: swh.lister.cran: Add description in task_dict.

In D1646#37939, @vlorentz wrote:

Got it, it's because you added that check in a test where swh.lister.cran.tasks.CRANLister is mocked, so when you do lister.run() nothing actually happens. (You can check, even lister.thisIsNotARealFunction() wouldn't crash)

Jun 26 2019, 6:29 PM

nahimilega added a comment to D1646: swh.lister.cran: Add description in task_dict.

In D1646#37937, @vlorentz wrote:

It means swh.scheduler.utils.create_task_dict was never called.

This is because the lister imports from swh.scheduler.utils import create_task_dict before it was patched. Instead, you should patch swh.lister.cran.lister.create_task_dict.

Jun 26 2019, 5:11 PM

nahimilega added a comment to D1646: swh.lister.cran: Add description in task_dict.

In D1646#37927, @vlorentz wrote:

What is the error message?

Jun 26 2019, 4:48 PM

nahimilega added a comment to D1646: swh.lister.cran: Add description in task_dict.

In D1646#37925, @vlorentz wrote:

No, when you do this, mock_create_tasks.assert_called_once_with() checks it was called a single time, and that single time is mock_create_tasks(). So what you are actually testing is that swh.scheduler.utils.create_task_dict is never called by the lister itself.

Jun 26 2019, 4:34 PM

nahimilega added inline comments to D1646: swh.lister.cran: Add description in task_dict.

Jun 26 2019, 4:16 PM

nahimilega updated subscribers of D1646: swh.lister.cran: Add description in task_dict.

Jun 26 2019, 4:13 PM