Rewrite an assertion on a test
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jul 21 2022
Remove useless assert statement
Cleaner code to get a dynamic git id in a test
Some changes and a new test
Add 'yanked' to artifacts data
Make use of finalize method to remove repository directory
Jul 20 2022
Change the way we list origins in incremental mode
Add dulwich to requirements
Jul 19 2022
Add a dulwich entry to mypy.ini to set ignore_missing_imports = True
Some typo and consistency fixes after code review
In D8101#211050, @vlorentz wrote:Have you considered using Dulwich (a Git library we already use) instead of shelling out to git? It looks like it would be easier than parsing output from git's UI
In D8101#211406, @ardumont wrote:12:00 <+ardumont> franckbret: can you please have a look at my comments in D8033 first? 12:00 -- Notice(swhbot): D8033 (author: franckbret, Needs Review) on swh-lister: Arch User Repository (AUR) lister <https://forge.softwareheritage.org/D8033> 12:02 <+ardumont> (but yeah, sure, for the other one, i'll check before the end of the week)As vlorentz has done a first pass already, i did not and i trust their judgement ;)
Make usage of dulwich for to replace some previous subprocess git commands.
Add a test to ensure everything runs fine in incremental mode even if there is no new commit since last lister invocation.
Jul 11 2022
Lister execution runs fine on first run (non incremental) :
Jul 8 2022
Jun 29 2022
Rebase
OOps sorry, forgot this one.
Thanks
Will work on the incremental lister, and then document (not already done).
In D8033#209048, @vlorentz wrote:In D8033#209040, @franckbret wrote:but got the bug while testing runner on docker
what was the bug, exactly? I don't see how the new code changes the behavior.
Also, please remove Updating D8033: [WIP] Arch User Repository (AUR) lister from change descriptions; it's redundant and makes the "History" tab of https://forge.softwareheritage.org/D8033#toc less readable
Hello,
The crates lister (stateless) and loader have landed.
I just solved some discovered issues while running lister and loader on the Docker env ( D8049 ).
Jun 28 2022
Updating D8033: [WIP] Arch User Repository (AUR) lister
Jun 24 2022
Jun 17 2022
Updating D7995: Arch Linux loader
Jun 16 2022
Updating D7995: Arch Linux loader
Updating D7995: WIP: Arch Linux loader
Updating D7995: Arch Linux loader
Updating D7993: Uncompress support '.tar.zst' extension and 'application/zstd', 'application/x-zstd' mime type.
Jun 15 2022
Archlinux lister execution on Docker runs fine without any error :
Updating D7894: Add arch lister module (origins from archives).
Updating D7894: Add arch lister module (origins from archives).
Updating D7894: Add arch lister module (origins from archives).
Jun 14 2022
Updating D7894: Add arch lister module (origins from archives).
Updating D7894: [WIP] Add arch lister module (origins from archives).
Jun 13 2022
Updating D7894: [WIP] Add arch lister module (origins from archives).
Updating D7894: [WIP] Add arch lister module (origins from archives).
In D7894#207366, @franckbret wrote:In D7894#207365, @vlorentz wrote:You can temporarily replace the pytest command with pytest -vv in tox.ini to get the full diff between the two results on Jenkins
Ok thanks! I suspect its related to python version. I'm actually building a new venv with python 3.7.3 to see if I can fail tests locally. Will try your hack if its not the case.
In D7894#207365, @vlorentz wrote:You can temporarily replace the pytest command with pytest -vv in tox.ini to get the full diff between the two results on Jenkins
Jun 10 2022
Updating D7894: [WIP] Add arch lister module (origins from archives).
Regenerate data fixtures (jenkins failed on previous commit but tests pass on my machine)
Updating D7894: [WIP] Add arch lister module (origins from archives).
Jun 3 2022
May 31 2022
May 27 2022
Updating D7894: [WIP] Add arch lister module (origins from archives).
May 25 2022
This one is ready for review.
Abandoned in favor of D7894
In D7812#205004, @ardumont wrote:I've made several experiments in order to find a better way to list arch linux
package.The most efficient way I've found is to download tar.gz files which contains package
name as directory and a "desc" file with easy to parse metadata. It works fine but
retrieve only the latest version of a package.Here are some time execution metrics for downloading archive and parse desc files.
Found 266 packages from https://archive.archlinux.org/repos/last/core/os/x86_64/core.files.tar.gz in 1.4924319160054438 seconds Found 3035 packages from https://archive.archlinux.org/repos/last/extra/os/x86_64/extra.files.tar.gz in 5.644616681995103 seconds Found 9161 packages from https://archive.archlinux.org/repos/last/community/os/x86_64/community.files.tar.gz in 16.14458583202213 secondsExample of retrieved package data after parsing:
{'arch': 'x86_64', 'repo': 'core', 'base': 'acl', 'builddate': '1643730617', 'conflicts': 'xfsacl', 'csize': '138970', 'desc': 'Access control list utilities, libraries and headers', 'filename': 'acl-2.3.1-2-x86_64.pkg.tar.zst', 'isize': '325349', 'license': 'LGPL', 'md5sum': '718c93159ce4dfc6f789ffe27ce276e8', 'name': 'acl', 'packager': 'Christian Hesse <eworm@archlinux.org>', 'pgpsig': 'iHUEABYIAB0WIQQEKYl95fO9rFN6MGltQr3RFuAGjwUCYflW2QAKCRBtQr3RFuAGj/waAP9U7gJZ0YRfftuGdc4shJdSIfspuWb3nZK+fj7My5z4zQD/SBpepSM3Cxr8Pw2LU5adq4UI0HWFZFsHrg3179XJqgI=', 'project_url': 'https://savannah.nongnu.org/projects/acl', 'replaces': 'xfsacl', 'sha256sum': '20873a994a0728de5b05857129c290e9a8c9bba2236cc30bcffa7b746ffe9218', 'url': 'https://archive.archlinux.org/packages/.all/acl-2.3.1-2-x86_64.pkg.tar.zst', 'version': '2.3.1-2'}If we are ok to get only latest version, we can go this way.
(as a data point) That's currently the way we are retrieving information for CRAN
packages. CRAN (infra) only exposes the latest version of a package (it exposes archived
versions with a dedicated instance we are not currently listing).But our lister is listing them everyday so from the moment we started ingested them, we
should have some versions for one package already. At some point, we'll have to attend
to the archived ones as well.So I guess, given your current experiments reported here (through the description and
this very comment), it'd be ok to do the same than CRAN here.Nonetheless, it's possible to get other versions of a package through two different
strategies, each with some pros and cons:
- Download index https://archive.archlinux.org/packages/.all/index.0.xz which contains a file that list several previous versions, for example:
mercurial-4.8.2-1-x86_64 mercurial-4.9-1-x86_64 mercurial-4.9.1-1-x86_64 mercurial-5.0-1-x86_64 ...Pro : One 500 ko file to download, one dynamic regex to find matches
Cons: we only get a filename, no date, no metadata. The files is +/- 400000 entries.
It tooks 16 min for the regex to find match for +/- 15000 packages...
- Scrap server directory listing to get previous versions of a package with its
release date, for example https://archive.archlinux.org/packages/m/mercurial/
Pro: Easy to scrap + a release date is associated to a version
Cons: Scrapping +/- 15000 pages can be quite slow, no metadata@vlorentz @ardumont @bchauvet what do you think, what do you prefer?
As mentioned, I'd go for the simplest solution (first one which allows more simple
metadata retrieval for the latest version only).Also do you I cancel that issue and create a new one to go on?
You can go either way. If you keep that one, it'd be easier to compare with your future
version (and the future review will be simpler, no noisy old comments). If you keep it,
we can still find its initial version through the history tab (within the web ui).Well, go simple, create a new one? (yeah, the opposite of what i said to you on irc on
friday ¯\_(ツ)_/¯ ;)Cheers,
May 18 2022
I've made several experiments in order to find a better way to list arch linux package.
May 11 2022
Updating D7812: [WIP] Add arch lister module.
May 4 2022
Updating D7713: crates: rework to take advantage of data returned by the crates lister
Apr 29 2022
Update back the diff
Add 'Franck Bret' to contributors
Apr 28 2022
prepare to push
Updating D7501: Rust lang, Crates loader
Updating D7654: crates, create one origin per package instead of per version
Apr 27 2022
Add missing "v1" in http api url
Added 'version' key to 'artifacts' dict.
In D7501#200562, @ardumont wrote:Proposed evolution on crate lister to be consistent with the loader here :
Nice, and now the loader needs some rework to match that diff's adaptations ;)
I've run the lister in the docker environment, it looks good.