It is the implementation of GNU Lister, this lister download the tree.json.gz file from https://ftp.gnu.org/tree.json.gz, reades its json content and returns the origin of repos py parsing over the json data.
Related T1722
Paths
| Differential D1482 Authored by nahimilega on May 17 2019, 12:30 PM.
Details
Summary It is the implementation of GNU Lister, this lister download the tree.json.gz file from https://ftp.gnu.org/tree.json.gz, reades its json content and returns the origin of repos py parsing over the json data. Related T1722
Diff Detail
Event TimelineThere are a very large number of changes, so older changes are hidden. Show Older Changes
Comment Actions
If that's blocking, do not hesitate to ask question in irc in that regards. Other can help in that matter.
Comment Actions
Ok @ardumont I will surely do it. Comment Actions Build is green Harbormaster completed remote builds in B6010: Diff 5062.Jun 4 2019, 3:30 PM2019-06-04 15:30:29 (UTC+2) Comment Actions
For the docker setup, P414 should be enough to run the gnu lister (providing your repository is in the branch with the gnu lister code). Amending the conf/lister.yml file to add the entries: celery: task_broker: amqp://guest:guest@amqp// task_modules: ... - swh.lister.gnu.tasks task_queues: ... - swh.lister.gnu.tasks.GNUListerTask Comment Actions
Thanks @ardumont. This part is not present in any documentation. I guess we can add a section on how to run lister in docker environmnet under lister tutorial section. Comment Actions
It's not. Indeed, adding a tutorial on how to add a new lister is a good idea. I'd:
Maybe you could update D1441 with such information, what do you think? Comment Actions
It would be a good idea Comment Actions To be clear, i'm fine with the diff now. Cheers, Comment Actions
Thanks @ardumont, as you mentioned I did follow those steps to run it in docker, although something went wrong with the docker container and I have to reinstall whole of the docker in my pc, hence I was not able to test this lister yet, I think I will fix the docker issues in my pc by the end of the day, and then I can try to run this. Harbormaster completed remote builds in B6073: Diff 5124.Jun 6 2019, 9:55 PM2019-06-06 21:55:58 (UTC+2) Comment Actions Build is green Comment Actions
Sure thing. For the problem mentionned in irc, i replied to you there, here is my take on this: 20:05 <+ardumont> in general, don't only rely on documentation as this can go out of sync 20:05 <+ardumont> take a look also at the code 20:05 <+ardumont> for the docker-dev, that'd the docker-compose file 20:05 <+ardumont> archit_agrawal[m: ^ 20:10 <archit_agrawal[m> ardumont: I will surely take a look at docker-compose file 20:12 <archit_agrawal[m> ardumont: as you told, I amended conf/lister.yml with gnu lister, now how shall I proceed further to sucessfully run the lister 20:28 <archit_agrawal[m> ardumont: Do I have to run the way mentioned in readme of swh-lister ? 21:45 <archit_agrawal[m> I am trying to run gnu lister in docker . I am getting ModuleNotFoundError: No module named 'psycopg2.errors' error, can anyone please help me. 21:45 <archit_agrawal[m> https://forge.softwareheritage.org/P419 21:45 -- Notice(swhbot): P419 (author: nahimilega): request 400 from scheduler <https://forge.softwareheritage.org/P419> 22:18 <kalpitk[m]> I think 'pip install psycopg2' inside virtual env will be enough 22:34 <archit_agrawal[m> kalpitk: It is already installed in virtual env 22:36 <+pinkieval> archit_agrawal[m: is the scheduler running in the venv? 22:38 <archit_agrawal[m> pinkieval: yes 22:39 <+pinkieval> can you paste its logs? 22:40 <archit_agrawal[m> pinkieval: https://forge.softwareheritage.org/P420 docker-compose ps outpur 22:40 -- Notice(swhbot): P420 (author: nahimilega): docker-compose ps output <https://forge.softwareheritage.org/P420> 22:42 <+pinkieval> if it's running in docker, then it's not running in the venv 22:42 <+pinkieval> and that's not its logs 22:43 <+pinkieval> "docker-compose logs swh-scheduler-api" 22:43 <archit_agrawal[m> pinkieval: https://forge.softwareheritage.org/P421 22:43 -- Notice(swhbot): P421 (author: nahimilega): scheduler api logs <https://forge.softwareheritage.org/P421> 22:44 <archit_agrawal[m> >and that's not its logs, I sent the previous message before I received this message 22:47 <+pinkieval> hmm, it has no issue referring to psycopg2 22:47 <archit_agrawal[m> pinkieval: >can you paste its logs? :I sent the previous message before I received this message 22:47 <+pinkieval> so the error is coming from the unpickling 22:48 <+pinkieval> python -c "import psycopg2.errors" 22:48 <+pinkieval> does this work? 22:49 <archit_agrawal[m> ModuleNotFoundError: No module named 'psycopg2.errors' 22:49 <archit_agrawal[m> No ---- 09:38 <+ardumont> archit_agrawal[m: pinkieval: there might be 2 errors involved, one triggering the other 09:39 <+ardumont> the first one being there is probably no scheduler task-type gnu-lister referenced in the scheduler 09:39 <+ardumont> thus, when the lister asks for creating that kind of task, it's not happy about it 09:39 <+ardumont> and then the error we see here about psycopg2.error module not found 09:43 <+ardumont> archit_agrawal[m: prior to triggering your gnu lister task in your docker-env, you need to add the associated task-type 09:43 <+ardumont> swh scheduler task-type add --help Comment Actions
@ardumont Thanks for your help, I ran the lister in docker, and it created scheduler task as Task 51 Next run: in 3 months (2019-09-05 09:46:26+00:00) Interval: 90 days, 0:00:00 Type: load-gnu Policy: recurring Status: next_run_not_scheduled Priority: Args: 'apl' 'https://ftp.gnu.org/gnu/apl/' Keyword args: tarballs: None I am not able to get why this tarballs Keyword args: is none, it there some error in the code? Comment Actions
I got the error Comment Actions Build is green Harbormaster completed remote builds in B6078: Diff 5129.Jun 7 2019, 1:31 PM2019-06-07 13:31:28 (UTC+2) Comment Actions @ardumont I checked it in docker, now it is working fine. Task 765 Next run: in 3 months (2019-09-05 11:18:21+00:00) Interval: 90 days, 0:00:00 Type: load-gnu Policy: recurring Status: next_run_not_scheduled Priority: Args: 'libiconv' 'https://ftp.gnu.org/old-gnu/libiconv/' Keyword args: tarballs: [{'date': '985114279', 'archive': 'https://ftp.gnu.org/old-gnu/libiconv/libiconv-1.6.1.tar.gz'}, {'date': '1054061763', 'archive': 'https://ftp.gnu.org/old-gnu/libiconv/libiconv-1.9.1.bin.woe32.zip'}, {'date': '1053376580', 'archive': 'https://ftp.gnu.org/old-gnu/libiconv/libiconv-1.9.bin.woe32.zip'}, {'date': '1053376846', 'archive': 'https://ftp.gnu.org/old-gnu/libiconv/libiconv-1.9.tar.gz'}] Comment Actions
Nice work on making it work! Just a couple of questions, see before this comment.
nahimilega marked an inline comment as not done.
Comment Actions Build is green Harbormaster completed remote builds in B6082: Diff 5133.Jun 7 2019, 2:14 PM2019-06-07 14:14:31 (UTC+2)
ardumont added inline comments.
This revision now requires changes to proceed.Jun 7 2019, 5:43 PM2019-06-07 17:43:05 (UTC+2) Comment Actions Related P422 Status so far: If 'tarballs' removed from model, this explodes. my take on this:
Comment Actions One way to avoid including tarballs in model is to make a variable instance of class named tarballs (like LISTER_NAME or TREE_URL), which would countain all the tarballs of each package and can be accessed from task_dict() function Comment Actions
Yes, please go that way. I'm not so keen on that solution because i prefer the code being stateless as much as possible (in that context, that means letting state pass through method/function parameters instead of relying on state variables to do neat tricks).
Cheers, Harbormaster completed remote builds in B6116: Diff 5158.Jun 8 2019, 7:10 PM2019-06-08 19:10:46 (UTC+2) Comment Actions Build is green nahimilega marked 2 inline comments as done. Comment Actions Build is green Harbormaster completed remote builds in B6117: Diff 5159.Jun 8 2019, 7:28 PM2019-06-08 19:28:19 (UTC+2) Comment Actions I tested the lister with new changes in the docker container, it worked fine. Here is one of the loader task it created. Task 15940 Next run: seconds ago (2019-06-08 18:02:07+00:00) Interval: 90 days, 0:00:00 Type: load-gnu Policy: recurring Status: next_run_scheduled Priority: Args: 'java2html' 'https://ftp.gnu.org/old-gnu/java2html/' Keyword args: tarballs: [{'date': '944729610', 'archive': 'https://ftp.gnu.org/old-gnu/java2html/java2html-1.3.1.tar.gz'}, {'date': '947003574', 'archive': 'https://ftp.gnu.org/old-gnu/java2html/java2html-1.4.tar.gz'}, {'date': '953974733', 'archive': 'https://ftp.gnu.org/old-gnu/java2html/java2html-1.5.tar.gz'}, {'date': '977303005', 'archive': 'https://ftp.gnu.org/old-gnu/java2html/java2html-1.6.tar.gz'}, {'date': '979403803', 'archive': 'https://ftp.gnu.org/old-gnu/java2html/java2html-1.7.tar.gz'}] Comment Actions Awesome. Almost there. Also, note that this is the full gnu lister.
nahimilega marked 2 inline comments as done. Comment Actions
Comment Actions Build is green Harbormaster completed remote builds in B6120: Diff 5162.Jun 9 2019, 12:26 PM2019-06-09 12:26:52 (UTC+2) Comment Actions So i'm mostly good with this. Real awesome that you made it work with the docker-env, i'm looking forward for the update on D1441 with what you had to do. Prior to merging this though, please try to clean up the test samples, keep them to a reasonable minimum (api_response.json, file_structure.json, etc...). There is no need to keep all extra files (the ones which are filtered out in the end: .sig, .ogg, ogv, ...). Cheers, Comment Actions Build is green Harbormaster completed remote builds in B6128: Diff 5167.Jun 11 2019, 11:48 AM2019-06-11 11:48:25 (UTC+2) This revision is now accepted and ready to land.Jun 11 2019, 12:03 PM2019-06-11 12:03:13 (UTC+2) Comment Actions If you do need to rebase, update the diff nonetheless (prior to push) so that phabricator sees the commits and close the diff itself. Comment Actions
I have already rebased it on the latest master :) Closed by commit rDLS151f6cd2235c: swh.lister.gnu (authored by nahimilega). · Explain WhyJun 11 2019, 12:07 PM2019-06-11 12:07:19 (UTC+2) This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 4870 swh/lister/core/tests/conftest.py
swh/lister/gnu/__init__.py
swh/lister/gnu/lister.py
swh/lister/gnu/models.py
|
Please put the "public API" methods at the beginning of the class definition, then the "private" stuff.
It would be nice to also add a revision that modifies the storage.py to make it clear what's the "public API" of this class and make sure it's properly documented.