Found a couple of issues while retesting the CRAN lister locally:
- some dates could not be parsed
- some packages might be listed twice
That diff contains two commits fixing those.
Differential D5025
cran: Fix some lister issues anlambert on Feb 5 2021, 12:58 PM. Authored by
Details Found a couple of issues while retesting the CRAN lister locally:
That diff contains two commits fixing those.
Diff Detail
Event TimelineComment Actions Build has FAILED Patch application report for D5025 (id=17914)Rebasing onto 4245c5046f... Current branch diff-target is up to date. Changes applied before testcommit 1b5cbd5df579762eae104e4e1fa1367d7c9f16d7 Author: Antoine Lambert <antoine.lambert@inria.fr> Date: Fri Feb 5 12:51:20 2021 +0100 cran: Prevent multiple listing of an origin A CRAN package can appear twice in the JSON list returned by the list_all_packages.R script, most recent version of the package appearing first. So handle that edge case to avoid error when sending origins to the scheduler. commit 56d3ae62fb3c62efe1a55a36b8360ddfd52467b9 Author: Antoine Lambert <antoine.lambert@inria.fr> Date: Fri Feb 5 12:37:49 2021 +0100 cran: Robustify package date parsing code Add support for parsing date with milliseconds and ensure locale is set to en_US in order to properly parse month and day of week in text format. Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/241/ Comment Actions Build is green Patch application report for D5025 (id=17916)Rebasing onto 4245c5046f... Current branch diff-target is up to date. Changes applied before testcommit c95bf9a79088bf892754082f381534b7ffa219ff Author: Antoine Lambert <antoine.lambert@inria.fr> Date: Fri Feb 5 12:51:20 2021 +0100 cran: Prevent multiple listing of an origin A CRAN package can appear twice in the JSON list returned by the list_all_packages.R script, most recent version of the package appearing first. So handle that edge case to avoid error when sending origins to the scheduler. commit a4319538606691666180939d1b2db67c610c8ef1 Author: Antoine Lambert <antoine.lambert@inria.fr> Date: Fri Feb 5 12:37:49 2021 +0100 cran: Add support for parsing date with milliseconds See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/242/ for more details.
Comment Actions I recall cran dates are hard to parse, see the loader's corresponding test [1] Comment Actions Build is green Patch application report for D5025 (id=17922)Rebasing onto 2461c97bbb... Current branch diff-target is up to date. Changes applied before testcommit 1803b707e4ba6e41e84976abfd18ff1d530b7ac7 Author: Antoine Lambert <antoine.lambert@inria.fr> Date: Fri Feb 5 12:51:20 2021 +0100 cran: Prevent multiple listing of an origin A CRAN package can appear twice in the JSON list returned by the list_all_packages.R script, most recent version of the package appearing first. So handle that edge case to avoid error when sending origins to the scheduler. commit b4c4c20bb92717d5f0d93aa624e24fbc8678f153 Author: Antoine Lambert <antoine.lambert@inria.fr> Date: Fri Feb 5 12:37:49 2021 +0100 cran: Add support for parsing date with milliseconds See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/245/ for more details. Comment Actions Only two dates with different formats cannot be parsed so I think we should be good here. swh-lister_1 | [2021-02-05 13:40:18,627: INFO/MainProcess] Received task: swh.lister.cran.tasks.CRANListerTask[c0240b23-8c1d-4929-ad98-39de13252152] swh-lister_1 | [2021-02-05 13:40:18,630: DEBUG/ForkPoolWorker-1] Loading config file /lister.yml swh-lister_1 | [2021-02-05 13:40:18,641: DEBUG/ForkPoolWorker-1] Executing R script /srv/softwareheritage/venv/lib/python3.7/site-packages/swh/lister/cran/list_all_packages.R swh-lister_1 | [2021-02-05 13:40:20,782: DEBUG/ForkPoolWorker-1] Could not parse DamiaNN package release date: 2016-09-13 swh-lister_1 | [2021-02-05 13:40:22,598: DEBUG/ForkPoolWorker-1] Could not parse JGR package release date: 2020-04-07 02:19:408 CDT |