Details
- Reviewers
vlorentz - Group Reviewers
Reviewers - Maniphest Tasks
- T4233: Ingest Arch Linux
T4466: Ingest Aur repository (Arch User Repository)
Diff Detail
- Repository
- rDLS Listers
- Branch
- arch-aur-improvements
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 31826 Build 49806: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 49805: arc lint + arc unit
Event Timeline
Build is green
Patch application report for D8553 (id=30853)
Rebasing onto 3928fc9ee9...
Current branch diff-target is up to date.
Changes applied before test
commit 80326169d38bba8f4b06db4e0d8605a44e4b0a7d Author: Antoine Lambert <anlambert@softwareheritage.org> Date: Tue Sep 27 17:11:52 2022 +0200 aur: Use dicts instead of lists as extra loader arguments Instead of letting the aur loader reconstructing those dicts, prefer to build them in the lister and pass them as extra loader arguments. Related to T4466 commit d69f94171fca043527da3af7542a7cd025416f98 Author: Antoine Lambert <anlambert@softwareheritage.org> Date: Tue Sep 27 17:10:58 2022 +0200 arch: Use dicts instead of lists as extra loader arguments Instead of letting the arch loader reconstructing those dicts, prefer to build them in the lister and pass them as extra loader arguments. Related to T4233
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/713/ for more details.
They intentionally use the same format as https://docs.softwareheritage.org/devel/swh-storage/extrinsic-metadata-specification.html#original-artifacts-json
And so ? This makes the loader simpler and has no effect on what we archive, so better using dicts here.
Plus those metadata are simply used to build ArchPackageInfo and AurPackageInfo instances in the respective loaders, they could have been merged in the artifacts dicts.
Those are loader parameters not related to extrinsic metadata, so I do not get your point here.
My point is that we should not create dozens of undocumented ad-hoc formats in the scheduler database just to remove two lines of code from each loader
My point is that we should not create dozens of undocumented ad-hoc formats in the scheduler database just to remove two lines of code from each loader
Ack, this was really not clear to me that we should use this format in package listers. We should update lister documentation about that fact.
Although, as using extra_loader_arguments as an escape hatch to pass lists of artifacts is starting to spread across lister/loader pairs (GNU, Maven, Arch, AUR, CRAN, Crates, and maybe OPAM), I would be fine with defining a new way to pass these objects (as well as arch_metadata/aur_metadata/crates_metadata?), but it should be done for all these loaders and using the same format.
(This will also make it easier to change the storage of these from large JSON objects to rows in the scheduler DB in the future, if their size becomes an issue for postgresql or celery; but not now)