Page MenuHomeSoftware Heritage

Arch User Repository (AUR) lister
ClosedPublic

Authored by franckbret on Jun 24 2022, 12:26 PM.

Details

Summary

Add 'aur' module to swh-lister with data fixtures and tests.
For now, origin url are package vcs (Git) url.

Diff Detail

Repository
rDLS Listers
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D8033 (id=28931)

Rebasing onto 1bf11aa26d...

Current branch diff-target is up to date.
Changes applied before test
commit 0238f136b5df5e684b393301d01319ee29abf423
Author: Franck Bret <franck.bret@octobus.net>
Date:   Fri Jun 24 12:19:15 2022 +0200

    [WIP] Arch User Repository (AUR) lister
    
    Add 'aur' module to swh-lister with data fixtures and tests.
    For now, origin url are package vcs (Git) url.

Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/549/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/549/console

Harbormaster returned this revision to the author for changes because remote builds failed.Jun 24 2022, 12:30 PM
Harbormaster failed remote builds in B30024: Diff 28931!

Updating D8033: [WIP] Arch User Repository (AUR) lister

Fix issue with 'last_modified' date timezone adding timezone.utc offset.

Build is green

Patch application report for D8033 (id=28933)

Rebasing onto 1bf11aa26d...

Current branch diff-target is up to date.
Changes applied before test
commit e91114aca59c17fb4b7e48028ac1687df580aa6d
Author: Franck Bret <franck.bret@octobus.net>
Date:   Fri Jun 24 12:19:15 2022 +0200

    [WIP] Arch User Repository (AUR) lister
    
    Add 'aur' module to swh-lister with data fixtures and tests.
    For now, origin url are package vcs (Git) url.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/550/ for more details.

@ardumont @vlorentz Hi, here is a first implementation of the Arch User Repository (AUR).

It's about 83841 packages today, https://aur.archlinux.org/packages.
I get the packages list through https://aur.archlinux.org/packages-meta-ext-v1.json.gz wich is about 6,6Mb gzip file.

There is an rpc named Aurweb RPC interface, see https://wiki.archlinux.org/title/Aurweb_RPC_interface but its recommended to download a json.gz file which contains some metadata for each packages. See https://lists.archlinux.org/pipermail/aur-general/2021-November/036659.html

The main difference from Arch linux is that the end user builds its own packages with the help of makepkg + pacman. See https://wiki.archlinux.org/title/Arch_User_Repository for more details.
It mainly stands on git repositories containing PKGBUILD file and .PKGINFO, see https://aur.archlinux.org/cgit/aur.git/log/?h=hg-evolve for an example.

There is no real direct way for the lister to discover where to download oldest versions of a package. There is a canonical url for each package in its page description but its the latest snapshot url, no way to know which version it is when downloading from this link.

For now I did not implement scrapping or whatever to get oldest versions of a package because I did not find a common way to do that. There is a git log on package page but messages are obviously inconsistent ( https://aur.archlinux.org/cgit/aur.git/log/?h=hg-evolve ).

Maybe we can play with git to found some kind of version diff history in .SRCINFO file as I've tried before for Arch PKGINFO file, but with this amount of repositories it will probably take a lot of time and resources. The .SRCINFO contains generally a link to the 'real' package corresponding version. See https://aur.archlinux.org/cgit/aur.git/tree/.SRCINFO?h=hg-evolve

What do you think?
Is it ok to say that the origin is a git repository url in this case?
If yes, will the loader really be a Package loader or VCS git loader?

swh/lister/aur/tests/__init__.py
2

no need, we usually have it empty.

There is no real direct way for the lister to discover where to download oldest versions of a package. There is a canonical url for each package in its page description but its the latest snapshot url, no way to know which version it is when downloading from this link.

If they are not directly available, then it doesn't make sense to have them in snapshots anyway. We'll just have successive visits of the loader as history.

For now I did not implement scrapping or whatever to get oldest versions of a package because I did not find a common way to do that. There is a git log on package page but messages are obviously inconsistent ( https://aur.archlinux.org/cgit/aur.git/log/?h=hg-evolve ).

Maybe we can play with git to found some kind of version diff history in .SRCINFO file as I've tried before for Arch PKGINFO file, but with this amount of repositories it will probably take a lot of time and resources. The .SRCINFO contains generally a link to the 'real' package corresponding version. See https://aur.archlinux.org/cgit/aur.git/tree/.SRCINFO?h=hg-evolve

What do you think?
Is it ok to say that the origin is a git repository url in this case?
If yes, will the loader really be a Package loader or VCS git loader?

clearly git loader, as there can be branches and whatnot; history also matters here.

However, the main content of these repositories is the PKGBUILD, which (among other things) fetches the code from somewhere else (tarball, git commit, ...), and the PKGBUILD alone is not very useful without that code. Therefore, it looks like we should implement something like T3923, to fetch the actual code.

Updating D8033: [WIP] Arch User Repository (AUR) lister

Fix an issue with requests usage when downloading packages archives ensuring it does not decode the binary directly. (tests and CI were fine, but got the bug while testing runner on docker)

Build is green

Patch application report for D8033 (id=28971)

Rebasing onto 1bf11aa26d...

Current branch diff-target is up to date.
Changes applied before test
commit 4729b0aae165aab640677adefc2f6ceb90072bcd
Author: Franck Bret <franck.bret@octobus.net>
Date:   Fri Jun 24 12:19:15 2022 +0200

    Arch User Repository (AUR) lister
    
    Add 'aur' module to swh-lister with data fixtures and tests.
    For now, origin url are package vcs (Git) url.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/551/ for more details.

but got the bug while testing runner on docker

what was the bug, exactly? I don't see how the new code changes the behavior.

Also, please remove Updating D8033: [WIP] Arch User Repository (AUR) lister from change descriptions; it's redundant and makes the "History" tab of https://forge.softwareheritage.org/D8033#toc less readable

franckbret retitled this revision from [WIP] Arch User Repository (AUR) lister to Arch User Repository (AUR) lister.Jun 29 2022, 3:03 PM

but got the bug while testing runner on docker

what was the bug, exactly? I don't see how the new code changes the behavior.

Also, please remove Updating D8033: [WIP] Arch User Repository (AUR) lister from change descriptions; it's redundant and makes the "History" tab of https://forge.softwareheritage.org/D8033#toc less readable

@vlorentz The change force requests.get to not automatically decompress the response. This way we are sure we always get the raw archive as is. See https://requests.readthedocs.io/en/latest/user/quickstart/#raw-response-content

swh/lister/aur/lister.py
22

No need if you always return one element?

32
35
49

I gather that's a working directory. If so, you'll need to have some post-listing routine cleanup.

107–122

I've also adapted according to my type suggestion early on.

swh/lister/aur/lister.py
22

i mean, your code below returns page of one element if i read it correctly.

124–139

in effect, you only read one origin per "page".

franckbret marked 7 inline comments as done.

Some typo and consistency fixes after code review

swh/lister/aur/lister.py
22

yep

swh/lister/aur/lister.py
49

Yes It is. Have you an example of post-listing routine action?
Same pattern used for arch and crates, I guess I need to do cleanup to

Build is green

Patch application report for D8033 (id=29383)

Rebasing onto 1bf11aa26d...

Current branch diff-target is up to date.
Changes applied before test
commit d4851a18c7bf97dcfffdb8362dcf4c6a720bde02
Author: Franck Bret <franck.bret@octobus.net>
Date:   Fri Jun 24 12:19:15 2022 +0200

    Arch User Repository (AUR) lister
    
    Add 'aur' module to swh-lister with data fixtures and tests.
    For now, origin url are package vcs (Git) url.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/554/ for more details.

swh/lister/aur/lister.py
49

Is finalize() suitable for that purpose?

swh/lister/aur/lister.py
49

yes

Split artifacts data to artifacts and aur_metadata

Build is green

Patch application report for D8033 (id=29732)

Rebasing onto cee6bcb514...

First, rewinding head to replay your work on top of it...
Applying: Arch User Repository (AUR) lister
Changes applied before test
commit a33fbf745785ebe179bce4fbbbbb52a17fabce65
Author: Franck Bret <franck.bret@octobus.net>
Date:   Fri Jun 24 12:19:15 2022 +0200

    Arch User Repository (AUR) lister
    
    Add 'aur' module to swh-lister with data fixtures and tests.
    For now, origin url are package vcs (Git) url.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/586/ for more details.

Add a finalize method that remove temporary directory once the lister complete

Build is green

Patch application report for D8033 (id=29755)

Rebasing onto cee6bcb514...

First, rewinding head to replay your work on top of it...
Applying: Arch User Repository (AUR) lister
Changes applied before test
commit 280be6e9530238a385843ee09ee5040ca0d60473
Author: Franck Bret <franck.bret@octobus.net>
Date:   Fri Jun 24 12:19:15 2022 +0200

    Arch User Repository (AUR) lister
    
    Add 'aur' module to swh-lister with data fixtures and tests.
    For now, origin url are package vcs (Git) url.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/587/ for more details.

Add a directory cleanup test

Build is green

Patch application report for D8033 (id=29760)

Rebasing onto cee6bcb514...

First, rewinding head to replay your work on top of it...
Applying: Arch User Repository (AUR) lister
Changes applied before test
commit 3e0c54bf111c316ead8a6804060f3432d8f3209f
Author: Franck Bret <franck.bret@octobus.net>
Date:   Fri Jun 24 12:19:15 2022 +0200

    Arch User Repository (AUR) lister
    
    Add 'aur' module to swh-lister with data fixtures and tests.
    For now, origin url are package vcs (Git) url.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/588/ for more details.

franckbret marked 2 inline comments as done.

Document the module

Build has FAILED

Patch application report for D8033 (id=29766)

Rebasing onto cee6bcb514...

First, rewinding head to replay your work on top of it...
Applying: Arch User Repository (AUR) lister
Changes applied before test
commit aeb60cc266f0fdf11e8650c9f7710ec00cf2443f
Author: Franck Bret <franck.bret@octobus.net>
Date:   Fri Jun 24 12:19:15 2022 +0200

    Arch User Repository (AUR) lister
    
    Add 'aur' module to swh-lister with data fixtures and tests.
    For now, origin url are package vcs (Git) url.

Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/589/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/589/console

Build has FAILED

Patch application report for D8033 (id=29767)

Rebasing onto cee6bcb514...

First, rewinding head to replay your work on top of it...
Applying: Arch User Repository (AUR) lister
Changes applied before test
commit 0b722b297b17060de8eca33bc7811039e0bbd3df
Author: Franck Bret <franck.bret@octobus.net>
Date:   Fri Jun 24 12:19:15 2022 +0200

    Arch User Repository (AUR) lister
    
    Add 'aur' module to swh-lister with data fixtures and tests.
    For now, origin url are package vcs (Git) url.

Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/590/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/590/console

Build is green

Patch application report for D8033 (id=29768)

Rebasing onto cee6bcb514...

First, rewinding head to replay your work on top of it...
Applying: Arch User Repository (AUR) lister
Changes applied before test
commit 3962b637421577c9fec8a0f0f1c8c7661b1bfa88
Author: Franck Bret <franck.bret@octobus.net>
Date:   Fri Jun 24 12:19:15 2022 +0200

    Arch User Repository (AUR) lister
    
    Add 'aur' module to swh-lister with data fixtures and tests.
    For now, origin url are package vcs (Git) url.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/591/ for more details.

Looks good! Two more nitpicks before this is good to merge:

swh/lister/aur/__init__.py
20–23

I don't think that's accurate; we do archive split packages, but only their "pkgbase" because that is the only one that actually has source code.

77–96

why is "version" in both?

Rephrase the documentation part about regular and split packages

Build is green

Patch application report for D8033 (id=29781)

Rebasing onto 6a53a6ad06...

First, rewinding head to replay your work on top of it...
Applying: Arch User Repository (AUR) lister
Changes applied before test
commit 96683382dc91d6927998797ba5a3cd609319eed4
Author: Franck Bret <franck.bret@octobus.net>
Date:   Fri Jun 24 12:19:15 2022 +0200

    Arch User Repository (AUR) lister
    
    Add 'aur' module to swh-lister with data fixtures and tests.
    For now, origin url are package vcs (Git) url.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/594/ for more details.

franckbret added inline comments.
swh/lister/aur/__init__.py
20–23

I rephrase it, does it looks ok for you?

77–96

Because its the associative key for lines for both dict which is needed for the loader the get the corresponding informations.
Btw there will be only one line in the case of aur. I think it's more explicit to keep it as is, but if you want me to remove it, I sure can.

Can we merge this one?

vlorentz added inline comments.
swh/lister/aur/__init__.py
77–96

nevermind, it's fine

This revision is now accepted and ready to land.Aug 19 2022, 10:30 AM

Build is green

Patch application report for D8033 (id=29847)

Rebasing onto 6a53a6ad06...

Current branch diff-target is up to date.
Changes applied before test
commit 97b353bf0b9a726ebb3d414f809aeac26a229f21
Author: Franck Bret <franck.bret@octobus.net>
Date:   Fri Jun 24 12:19:15 2022 +0200

    Arch User Repository (AUR) lister
    
    Add 'aur' module to swh-lister with data fixtures and tests.
    For now, origin url are package vcs (Git) url.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/597/ for more details.

This revision was automatically updated to reflect the committed changes.