Implement a packagist lister to list the release and
source code origin for all the packages
Close T1923
Differential D1584
swh.lister.packagist nahimilega on Jun 14 2019, 12:59 PM. Authored by
Details
Implement a packagist lister to list the release and Close T1923
Diff Detail
Event TimelineComment Actions Build has FAILED Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tox/185/ Comment Actions Build is green Comment Actions I tried to run this for first 5 packages in packagist, and it worked fine Here are the few tasks created by the lister Task 15952 Next run: just now (2019-06-14 12:22:58+00:00) Interval: 64 days, 0:00:00 Type: load-git Policy: recurring Status: next_run_not_scheduled Priority: Args: 'https://github.com/0099FF/dialogflow-php.git' Keyword args: Task 15953 Next run: just now (2019-06-14 12:22:58+00:00) Interval: 90 days, 0:00:00 Type: load-packagist Policy: recurring Status: next_run_not_scheduled Priority: Args: '0.0.0/laravel-env-shim' None Keyword args: tarballs: [{'url': 'https://api.github.com/repos/phpexpertsinc/Laravel57-env-polyfill/zipball/5d4eb47b7469defe3e0dcc0c1f74ac585ca9f90f', 'vcs': 'zip', 'description': "Backward compatible support for Laravel's pre-5.8 env() function."}, {'url': 'https://api.github.com/repos/phpexpertsinc/Laravel57-env-polyfill/zipball/4dc9876f8fa9064c672e1811375d2e11828cb10b', 'vcs': 'zip', 'description': "Backward compatible support for Laravel's pre-5.8 env() function."}, {'url': 'https://api.github.com/repos/phpexpertsinc/Laravel57-env-polyfill/zipball/bfc0ba73d4567b88aae325754e9c389d75d20f99', 'vcs': 'zip', 'description': "Backward compatible support for Laravel's pre-5.8 env() function."}] Comment Actions Build is green
Comment Actions @nahimilega Heads up, this needs a rebase (same goes for the packagist lister). Comment Actions Currently, this lister visits the page of each and every package to get all the versions of a particular package. This approach is really slow, for me it took 2 hours to list 6% of the total package. Comment Actions ok
I suppose you meant this lists only the name....
And then we could have that behavior fall under the scope of the distribution loader then, is that it?
sound better indeed. Comment Actions
Ya, sorry for the typo
Yes, as we can deploy multiple workers for loading(I am not sure if I used correct technical terms here), hence it would speed up the process.
So I guess I should change the approach of this lister implementation ;) Comment Actions Build is green Comment Actions From afar:
Comment Actions
no, i saw cgit. Comment Actions
Comment Actions Build is green Comment Actions
I didn't add test cases for lister because the code of lister is minimal in itself and it is just iterating over the name of the packages. There is no function outside the lister class, so it would be a bit difficult to write test cases here. Moreover, this lister is quite similar to pypi lister, and it does not have any separate test for the lister. Do we need test cases for the lister? Comment Actions One thing on I am not sure about regarding this approach : But as now the visit to the description page will the part of the packagist loader, so I want to ask can we still utilise the upstream link. I mean can one loader(packagist loader in this case) call another loader task(e.g git loader)? Comment Actions
So the tests should be simple enough to add.
It's wrong that the pypi lister has no tests. T1890 to the rescue!
Yes, we do. While the lister tests are not that fun to read, understand, nor maintain, the scaffolding is there. Comment Actions
I'm sorry, i did not follow. Comment Actions
According to the previous approach, it first finds names of all the packages and then visits their metadata url to get the tarball url of all the release. { "release": [ { "vcs": "zip", "url": "https://api.github.com/repos/Seldaek/monolog/zipball/ 433b98d4218c181bae01865901aac045585e8a1a", "description": "Logging for PHP 5.3" }, { "vcs": "zip", "url": "https://api.github.com/repos/Seldaek/monolog/zipball/ 5e651a82b4b03d267da6084720ada0cd398c8d16", "description": "Logging for PHP 5.3" }, ... ] "source": [ { "vcs": "git", "url": "https://github.com/Seldaek/monolog.git", "description": "Logging for PHP 5.3" }, ... ] Here, as you can see the API response also provide upstream repository under the key source. To utilize this, in the previous approach, lister was creating a task of respective vcs loader(git loader in this example) and also created a packagist loader task to ingest tarballs provided. Now with a new approach, lister only created the packagist loader task with names of all the packages. The step of visiting metadata page will be implemented in packagist loader. So to ingest the upstream repository under the key source the packagist loader will need to create a git loader task. So I want to ask is it possible that packagist loader can create git loader task. If not, is there any other way to ingest the upstream repository? Comment Actions Thanks for clarifying, i was out of context for a minute.
yes.
As exchanged during our weekly plan, we will skip that problematic for the moment. Comment Actions Build is green Comment Actions Build is green Comment Actions I see you define only tests for the new endpoints you define in packagist lister. Is there a blocking point preventing you from implementing the "usual" tests here? (Request changes for the sake of the question) Comment Actions There is a need for variables like test_re (Compiled regex matching the server url. Must capture the index value), whereas these listers (packagist and pypi) do not have an index value, they just make a request to one page and parse over that request to get the info, hence I didn't use the conventional method of testing (method using HttpListerTester class) Comment Actions I tested this docker. It is running fine. Here is a task created by the lister Task 8 Next run: 2 minutes ago (2019-07-18 18:25:07+00:00) Interval: 90 days, 0:00:00 Type: load-packagist Policy: recurring Status: next_run_scheduled Priority: Args: 'hypejunction/hypegamemechanics' 'https://repo.packagist.org/p/hypejunction/hypegamemechanics.json' Keyword args: Comment Actions
Hope this diff is ready to land ;) Comment Actions Build is green Comment Actions
Almost (technically it's good). It misses some documentation I forgot to ask earlier (even in other listers). We should document the output of the lister, for example:
Here is the list i can think of for this lister, please complete it and amend the PackagitLister's docstring:
maybe more, i'm not sure.
Comment Actions
This lister do not take any arguments, but for the listers like phabricator, we can also have an Args: section in the docstring.
Yes, it was there in the old implementation, but in the new one we only have load-packagist Comment Actions Build is green
Comment Actions Almost there, please fix the typos and change slightly the format. And now, there will be a need to add a 'load-packagist' task-type in scheduler. Comment Actions
Do we need to do this in this diff itself? Comment Actions I was thinking we can make this task of adding docstrings for all lister class as an Easy-Hack for first-time contributors. Comment Actions Build is green Comment Actions
Yes, why not. |