Add docstring to method (explicit rather than implicit)
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jul 18 2019
- Improve docstrings
- Perform recommended changes
Jul 17 2019
This is an amazing idea,
- Add comments on the test
- Remove unnecessary test from HttpSimpleListerTester
- Increase the number of hookpoints
There is a concern regarding base loader that its implementation style could invite the problems that are present in lister base code. Here is my take on that.
There are my thoughts on this topic. These are the points I keep in mind while making the base loader.
Jul 16 2019
- swh.lister.core: Add test for simple lister
- swh.lister.pypi: Add tests
Jul 15 2019
Keep the stuff in swh/loader/core, not swh/loader/base.
I choose to keep in in a separate folder just for better handling. I mean there are many files in the core itself, and adding those will move it more cluttered. Can I do this in the end, before landing as it, at it is difficult for me to work on base loader when everything is in a common folder(ie core)?
There is a need for variables like test_re (Compiled regex matching the server url. Must capture the index value), whereas these listers (packagist and pypi) do not have an index value, they just make a request to one page and parse over that request to get the info, hence I didn't use the conventional method of testing (method using HttpListerTester class)
(Same implies for pypi)
Jul 14 2019
Add header in test files.
- Add tests for lister.py
- rebase on latest master
- swh.loader.base: The complete snapshot building process
The base loader is complete but there are small problems I am facing in a few steps
- How to get the branch name
- How to find the HEAD for the branch, and there could be some package manager where it is not possible to find head because of lack of metadata, how to deal with those cases
Jul 13 2019
Jul 12 2019
- Make generate_snapshot() method
- swh.loader.base: Improve docstrings
The base loader is almost complete with just a few more methods to implement.
- Fix docstring and make construct_revision class
Jul 11 2019
This is the proposed implementation of CRAN Loader using base loader(D1694)
This implementation requires some small refactorings in cran lister
Complete prepare method and download.py
Note: Many docstrings are out of sync, they don't tell what the method is really doing
Jul 10 2019
Jul 8 2019
P.S. If you want us to provide an English translation of the diagrams and their corresponding explanation, please feel free to notify us.
I'm sorry, I did not follow.
According to the previous approach, it first finds names of all the packages and then visits their metadata url to get the tarball url of all the release.
eg for a package name 'monolog/monolog'
Its metadata page(https://repo.packagist.org/p/monolog/monolog.json) will give tarball url of all the release.
Something like this
{ "release": [ { "vcs": "zip", "url": "https://api.github.com/repos/Seldaek/monolog/zipball/ 433b98d4218c181bae01865901aac045585e8a1a", "description": "Logging for PHP 5.3" }, { "vcs": "zip", "url": "https://api.github.com/repos/Seldaek/monolog/zipball/ 5e651a82b4b03d267da6084720ada0cd398c8d16", "description": "Logging for PHP 5.3" }, ... ] "source": [ { "vcs": "git", "url": "https://github.com/Seldaek/monolog.git", "description": "Logging for PHP 5.3" }, ... ]
Here, as you can see the API response also provide upstream repository under the key source. To utilize this, in the previous approach, lister was creating a task of respective vcs loader(git loader in this example) and also created a packagist loader task to ingest tarballs provided.
Jul 7 2019
This diff is in its nascent stage, the methods that have # Done as a comment
are completed. Rest of them are still in the process. There are also some rogue
comments present, those are just for my help during development and will remove them
once the code is complete.
One thing on I am not sure about regarding this approach :
The description page also provides the upstream link of the package along with the link to tarballs. In the previous approach, a separate loader task was created for those upstream link to utilise all the information provided and further increase the archive coverage.
it's missing some tests
- Rebase on latest master
- Remove safety_issue_request() method
- Check for out of sync docstrings
- Update git commit message according to new approach
Jul 5 2019
Remove page visit step
Jul 2 2019
I suppose you meant this lists only the name....
Ya, sorry for the typo
Currently, this lister visits the page of each and every package to get all the versions of a particular package. This approach is really slow, for me it took 2 hours to list 6% of the total package.
One thing that could be done is, this lister only the name of the package and the URL of metadata for a package and then loader visits that URL to get all the versions(like done in pypi lister). This approach will take only a couple of seconds and reduce the code for the lister.
Extending over the plan by @olasd. Here are some of my thoughts on the implementation of base loader.
Jun 28 2019
My two cents
I think landing by reviewer is a good idea, because when the diff is passed(accepted to land), author can still make changes in it and push it, and that changes might not be good one.
This could be a potential threat to code quality and other factors.
On the other hand, landing by reviewer would add one more task in their plate.
- add method to avoid list wrapping
- rebase on latest master
@nahimilega , I will commit the fix. You will just have to rebase your diff before landing.
- swh.lister.core: Increase flush frequency in simple lister
- Made recommended changes
Jun 27 2019
Shifted the test to another file.
Add test to avoid removal of description in future
Jun 26 2019
- Add testcases and changed variable base_url to url and origin_url_prefix to url_prefix
- rebased on master
In D1646#37941, @vlorentz wrote:Remove the argument of the test
In D1646#37939, @vlorentz wrote:Got it, it's because you added that check in a test where swh.lister.cran.tasks.CRANLister is mocked, so when you do lister.run() nothing actually happens. (You can check, even lister.thisIsNotARealFunction() wouldn't crash)
In D1646#37937, @vlorentz wrote:It means swh.scheduler.utils.create_task_dict was never called.
This is because the lister imports from swh.scheduler.utils import create_task_dict before it was patched. Instead, you should patch swh.lister.cran.lister.create_task_dict.
In D1646#37927, @vlorentz wrote:What is the error message?
In D1646#37925, @vlorentz wrote:No, when you do this, mock_create_tasks.assert_called_once_with() checks it was called a single time, and that single time is mock_create_tasks(). So what you are actually testing is that swh.scheduler.utils.create_task_dict is never called by the lister itself.