Add 'aur' module to swh-lister with data fixtures and tests.
For now, origin url are package vcs (Git) url.
Details
- Reviewers
vlorentz - Group Reviewers
Reviewers - Commits
- rDLS97b353bf0b9a: Arch User Repository (AUR) lister
Diff Detail
- Repository
- rDLS Listers
- Branch
- aur
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 30812 Build 48171: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 48170: arc lint + arc unit
Event Timeline
Build has FAILED
Patch application report for D8033 (id=28931)
Rebasing onto 1bf11aa26d...
Current branch diff-target is up to date.
Changes applied before test
commit 0238f136b5df5e684b393301d01319ee29abf423 Author: Franck Bret <franck.bret@octobus.net> Date: Fri Jun 24 12:19:15 2022 +0200 [WIP] Arch User Repository (AUR) lister Add 'aur' module to swh-lister with data fixtures and tests. For now, origin url are package vcs (Git) url.
Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/549/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/549/console
Updating D8033: [WIP] Arch User Repository (AUR) lister
Fix issue with 'last_modified' date timezone adding timezone.utc offset.
Build is green
Patch application report for D8033 (id=28933)
Rebasing onto 1bf11aa26d...
Current branch diff-target is up to date.
Changes applied before test
commit e91114aca59c17fb4b7e48028ac1687df580aa6d Author: Franck Bret <franck.bret@octobus.net> Date: Fri Jun 24 12:19:15 2022 +0200 [WIP] Arch User Repository (AUR) lister Add 'aur' module to swh-lister with data fixtures and tests. For now, origin url are package vcs (Git) url.
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/550/ for more details.
@ardumont @vlorentz Hi, here is a first implementation of the Arch User Repository (AUR).
It's about 83841 packages today, https://aur.archlinux.org/packages.
I get the packages list through https://aur.archlinux.org/packages-meta-ext-v1.json.gz wich is about 6,6Mb gzip file.
There is an rpc named Aurweb RPC interface, see https://wiki.archlinux.org/title/Aurweb_RPC_interface but its recommended to download a json.gz file which contains some metadata for each packages. See https://lists.archlinux.org/pipermail/aur-general/2021-November/036659.html
The main difference from Arch linux is that the end user builds its own packages with the help of makepkg + pacman. See https://wiki.archlinux.org/title/Arch_User_Repository for more details.
It mainly stands on git repositories containing PKGBUILD file and .PKGINFO, see https://aur.archlinux.org/cgit/aur.git/log/?h=hg-evolve for an example.
There is no real direct way for the lister to discover where to download oldest versions of a package. There is a canonical url for each package in its page description but its the latest snapshot url, no way to know which version it is when downloading from this link.
For now I did not implement scrapping or whatever to get oldest versions of a package because I did not find a common way to do that. There is a git log on package page but messages are obviously inconsistent ( https://aur.archlinux.org/cgit/aur.git/log/?h=hg-evolve ).
Maybe we can play with git to found some kind of version diff history in .SRCINFO file as I've tried before for Arch PKGINFO file, but with this amount of repositories it will probably take a lot of time and resources. The .SRCINFO contains generally a link to the 'real' package corresponding version. See https://aur.archlinux.org/cgit/aur.git/tree/.SRCINFO?h=hg-evolve
What do you think?
Is it ok to say that the origin is a git repository url in this case?
If yes, will the loader really be a Package loader or VCS git loader?
swh/lister/aur/tests/__init__.py | ||
---|---|---|
2 | no need, we usually have it empty. |
If they are not directly available, then it doesn't make sense to have them in snapshots anyway. We'll just have successive visits of the loader as history.
For now I did not implement scrapping or whatever to get oldest versions of a package because I did not find a common way to do that. There is a git log on package page but messages are obviously inconsistent ( https://aur.archlinux.org/cgit/aur.git/log/?h=hg-evolve ).
Maybe we can play with git to found some kind of version diff history in .SRCINFO file as I've tried before for Arch PKGINFO file, but with this amount of repositories it will probably take a lot of time and resources. The .SRCINFO contains generally a link to the 'real' package corresponding version. See https://aur.archlinux.org/cgit/aur.git/tree/.SRCINFO?h=hg-evolve
What do you think?
Is it ok to say that the origin is a git repository url in this case?
If yes, will the loader really be a Package loader or VCS git loader?
clearly git loader, as there can be branches and whatnot; history also matters here.
However, the main content of these repositories is the PKGBUILD, which (among other things) fetches the code from somewhere else (tarball, git commit, ...), and the PKGBUILD alone is not very useful without that code. Therefore, it looks like we should implement something like T3923, to fetch the actual code.
Updating D8033: [WIP] Arch User Repository (AUR) lister
Fix an issue with requests usage when downloading packages archives ensuring it does not decode the binary directly. (tests and CI were fine, but got the bug while testing runner on docker)
Build is green
Patch application report for D8033 (id=28971)
Rebasing onto 1bf11aa26d...
Current branch diff-target is up to date.
Changes applied before test
commit 4729b0aae165aab640677adefc2f6ceb90072bcd Author: Franck Bret <franck.bret@octobus.net> Date: Fri Jun 24 12:19:15 2022 +0200 Arch User Repository (AUR) lister Add 'aur' module to swh-lister with data fixtures and tests. For now, origin url are package vcs (Git) url.
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/551/ for more details.
what was the bug, exactly? I don't see how the new code changes the behavior.
Also, please remove Updating D8033: [WIP] Arch User Repository (AUR) lister from change descriptions; it's redundant and makes the "History" tab of https://forge.softwareheritage.org/D8033#toc less readable
@vlorentz The change force requests.get to not automatically decompress the response. This way we are sure we always get the raw archive as is. See https://requests.readthedocs.io/en/latest/user/quickstart/#raw-response-content
swh/lister/aur/lister.py | ||
---|---|---|
49 | Yes It is. Have you an example of post-listing routine action? |
Build is green
Patch application report for D8033 (id=29383)
Rebasing onto 1bf11aa26d...
Current branch diff-target is up to date.
Changes applied before test
commit d4851a18c7bf97dcfffdb8362dcf4c6a720bde02 Author: Franck Bret <franck.bret@octobus.net> Date: Fri Jun 24 12:19:15 2022 +0200 Arch User Repository (AUR) lister Add 'aur' module to swh-lister with data fixtures and tests. For now, origin url are package vcs (Git) url.
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/554/ for more details.
swh/lister/aur/lister.py | ||
---|---|---|
49 | Is finalize() suitable for that purpose? |
swh/lister/aur/lister.py | ||
---|---|---|
49 | yes |
Build is green
Patch application report for D8033 (id=29732)
Rebasing onto cee6bcb514...
First, rewinding head to replay your work on top of it... Applying: Arch User Repository (AUR) lister
Changes applied before test
commit a33fbf745785ebe179bce4fbbbbb52a17fabce65 Author: Franck Bret <franck.bret@octobus.net> Date: Fri Jun 24 12:19:15 2022 +0200 Arch User Repository (AUR) lister Add 'aur' module to swh-lister with data fixtures and tests. For now, origin url are package vcs (Git) url.
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/586/ for more details.
Build is green
Patch application report for D8033 (id=29755)
Rebasing onto cee6bcb514...
First, rewinding head to replay your work on top of it... Applying: Arch User Repository (AUR) lister
Changes applied before test
commit 280be6e9530238a385843ee09ee5040ca0d60473 Author: Franck Bret <franck.bret@octobus.net> Date: Fri Jun 24 12:19:15 2022 +0200 Arch User Repository (AUR) lister Add 'aur' module to swh-lister with data fixtures and tests. For now, origin url are package vcs (Git) url.
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/587/ for more details.
Build is green
Patch application report for D8033 (id=29760)
Rebasing onto cee6bcb514...
First, rewinding head to replay your work on top of it... Applying: Arch User Repository (AUR) lister
Changes applied before test
commit 3e0c54bf111c316ead8a6804060f3432d8f3209f Author: Franck Bret <franck.bret@octobus.net> Date: Fri Jun 24 12:19:15 2022 +0200 Arch User Repository (AUR) lister Add 'aur' module to swh-lister with data fixtures and tests. For now, origin url are package vcs (Git) url.
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/588/ for more details.
Build has FAILED
Patch application report for D8033 (id=29766)
Rebasing onto cee6bcb514...
First, rewinding head to replay your work on top of it... Applying: Arch User Repository (AUR) lister
Changes applied before test
commit aeb60cc266f0fdf11e8650c9f7710ec00cf2443f Author: Franck Bret <franck.bret@octobus.net> Date: Fri Jun 24 12:19:15 2022 +0200 Arch User Repository (AUR) lister Add 'aur' module to swh-lister with data fixtures and tests. For now, origin url are package vcs (Git) url.
Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/589/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/589/console
Build has FAILED
Patch application report for D8033 (id=29767)
Rebasing onto cee6bcb514...
First, rewinding head to replay your work on top of it... Applying: Arch User Repository (AUR) lister
Changes applied before test
commit 0b722b297b17060de8eca33bc7811039e0bbd3df Author: Franck Bret <franck.bret@octobus.net> Date: Fri Jun 24 12:19:15 2022 +0200 Arch User Repository (AUR) lister Add 'aur' module to swh-lister with data fixtures and tests. For now, origin url are package vcs (Git) url.
Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/590/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/590/console
Build is green
Patch application report for D8033 (id=29768)
Rebasing onto cee6bcb514...
First, rewinding head to replay your work on top of it... Applying: Arch User Repository (AUR) lister
Changes applied before test
commit 3962b637421577c9fec8a0f0f1c8c7661b1bfa88 Author: Franck Bret <franck.bret@octobus.net> Date: Fri Jun 24 12:19:15 2022 +0200 Arch User Repository (AUR) lister Add 'aur' module to swh-lister with data fixtures and tests. For now, origin url are package vcs (Git) url.
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/591/ for more details.
Build is green
Patch application report for D8033 (id=29781)
Rebasing onto 6a53a6ad06...
First, rewinding head to replay your work on top of it... Applying: Arch User Repository (AUR) lister
Changes applied before test
commit 96683382dc91d6927998797ba5a3cd609319eed4 Author: Franck Bret <franck.bret@octobus.net> Date: Fri Jun 24 12:19:15 2022 +0200 Arch User Repository (AUR) lister Add 'aur' module to swh-lister with data fixtures and tests. For now, origin url are package vcs (Git) url.
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/594/ for more details.
swh/lister/aur/__init__.py | ||
---|---|---|
20–23 | I rephrase it, does it looks ok for you? | |
77–96 | Because its the associative key for lines for both dict which is needed for the loader the get the corresponding informations. |
swh/lister/aur/__init__.py | ||
---|---|---|
77–96 | nevermind, it's fine |
Build is green
Patch application report for D8033 (id=29847)
Rebasing onto 6a53a6ad06...
Current branch diff-target is up to date.
Changes applied before test
commit 97b353bf0b9a726ebb3d414f809aeac26a229f21 Author: Franck Bret <franck.bret@octobus.net> Date: Fri Jun 24 12:19:15 2022 +0200 Arch User Repository (AUR) lister Add 'aur' module to swh-lister with data fixtures and tests. For now, origin url are package vcs (Git) url.
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/597/ for more details.