Page MenuHomeSoftware Heritage

arch/aur: Use dicts instead of lists as extra loader arguments
AbandonedPublic

Authored by anlambert on Sep 27 2022, 5:16 PM.

Details

Summary

Instead of letting the arch/aur loader reconstructing those dicts, prefer to
build them in the lister and pass them as extra loader arguments.

Related to T4233
Related to T4466

Diff Detail

Repository
rDLS Listers
Branch
arch-aur-improvements
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 31826
Build 49806: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 49805: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D8553 (id=30853)

Rebasing onto 3928fc9ee9...

Current branch diff-target is up to date.
Changes applied before test
commit 80326169d38bba8f4b06db4e0d8605a44e4b0a7d
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Sep 27 17:11:52 2022 +0200

    aur: Use dicts instead of lists as extra loader arguments
    
    Instead of letting the aur loader reconstructing those dicts, prefer to
    build them in the lister and pass them as extra loader arguments.
    
    Related to T4466

commit d69f94171fca043527da3af7542a7cd025416f98
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Sep 27 17:10:58 2022 +0200

    arch: Use dicts instead of lists as extra loader arguments
    
    Instead of letting the arch loader reconstructing those dicts, prefer to
    build them in the lister and pass them as extra loader arguments.
    
    Related to T4233

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/713/ for more details.

This revision now requires changes to proceed.Sep 27 2022, 5:35 PM

And so ? This makes the loader simpler and has no effect on what we archive, so better using dicts here.

And so ? This makes the loader simpler and has no effect on what we archive, so better using dicts here.

Plus those metadata are simply used to build ArchPackageInfo and AurPackageInfo instances in the respective loaders, they could have been merged in the artifacts dicts.

It avoids the proliferation of ad-hoc formats

It avoids the proliferation of ad-hoc formats

Those are loader parameters not related to extrinsic metadata, so I do not get your point here.

My point is that we should not create dozens of undocumented ad-hoc formats in the scheduler database just to remove two lines of code from each loader

My point is that we should not create dozens of undocumented ad-hoc formats in the scheduler database just to remove two lines of code from each loader

Ack, this was really not clear to me that we should use this format in package listers. We should update lister documentation about that fact.

Although, as using extra_loader_arguments as an escape hatch to pass lists of artifacts is starting to spread across lister/loader pairs (GNU, Maven, Arch, AUR, CRAN, Crates, and maybe OPAM), I would be fine with defining a new way to pass these objects (as well as arch_metadata/aur_metadata/crates_metadata?), but it should be done for all these loaders and using the same format.

(This will also make it easier to change the storage of these from large JSON objects to rows in the scheduler DB in the future, if their size becomes an issue for postgresql or celery; but not now)

Although, as using extra_loader_arguments as an escape hatch to pass lists of artifacts is starting to spread across lister/loader pairs (GNU, Maven, Arch, AUR, CRAN, Crates, and maybe OPAM), I would be fine with defining a new way to pass these objects (as well as arch_metadata/aur_metadata/crates_metadata?), but it should be done for all these loaders and using the same format.

(This will also make it easier to change the storage of these from large JSON objects to rows in the scheduler DB in the future, if their size becomes an issue for postgresql or celery; but not now)

Alright, abandoning this at the moment then.