Change Details

2 sides of that coin which can be done independently and in any order we so choose: #### Lister algo: - drop the R cran script - parse the listing page instead (as in simple_lister, check lister cgit's way of doing it) [1] - for each package found there, send the origin url [2] to the loader (as `recurring` task) schema adaptations: - make the tasks outputed by the lister as `recurring` (currently `oneshot`) - Adapt uid field to be the origin_url's value migration plan: - truncate cran_repo table - trigger back a full listing #### Loader algo: - Improve the loader so it scrapes that origin url [2] page. - It then determines itself what the artifact urls it needs to ingest - In the [2] page, there is an archive link `Old source` which lists the previous artifact version. The good news is this that be done independently and in any order (this task can then be split in 2 subtasks). [1] https://cran.r-project.org/web/packages/available_packages_by_date.html This can be subject to discussion with the cran community to ask for a better api endpoint (if it's not too much hassle for them to adapt and provide ;) [2] https://cran.r-project.org/package=<package-name>