Page MenuHomeSoftware Heritage
Feed Advanced Search

Jul 21 2022

franckbret updated the diff for D8101: crates.lister: Implement incremental mode.

Rewrite an assertion on a test

Jul 21 2022, 2:52 PM
franckbret updated the diff for D8101: crates.lister: Implement incremental mode.

Remove useless assert statement

Jul 21 2022, 2:47 PM
franckbret added inline comments to D8101: crates.lister: Implement incremental mode.
Jul 21 2022, 11:18 AM
franckbret updated the diff for D8101: crates.lister: Implement incremental mode.

Cleaner code to get a dynamic git id in a test

Jul 21 2022, 10:49 AM
franckbret updated the diff for D8101: crates.lister: Implement incremental mode.

Some changes and a new test

Jul 21 2022, 10:43 AM
franckbret updated the diff for D8101: crates.lister: Implement incremental mode.

Add 'yanked' to artifacts data

Jul 21 2022, 10:06 AM
franckbret added inline comments to D8101: crates.lister: Implement incremental mode.
Jul 21 2022, 9:46 AM
franckbret updated the diff for D8101: crates.lister: Implement incremental mode.

Make use of finalize method to remove repository directory

Jul 21 2022, 8:59 AM

Jul 20 2022

franckbret updated the diff for D8101: crates.lister: Implement incremental mode.

Change the way we list origins in incremental mode

Jul 20 2022, 4:25 PM
franckbret updated the diff for D8101: crates.lister: Implement incremental mode.

Add dulwich to requirements

Jul 20 2022, 10:53 AM
franckbret added inline comments to D8101: crates.lister: Implement incremental mode.
Jul 20 2022, 10:23 AM
franckbret added inline comments to D8033: Arch User Repository (AUR) lister.
Jul 20 2022, 9:52 AM

Jul 19 2022

franckbret updated the diff for D8101: crates.lister: Implement incremental mode.

Add a dulwich entry to mypy.ini to set ignore_missing_imports = True

Jul 19 2022, 6:04 PM
franckbret added inline comments to D8033: Arch User Repository (AUR) lister.
Jul 19 2022, 4:35 PM
franckbret added inline comments to D8033: Arch User Repository (AUR) lister.
Jul 19 2022, 4:32 PM
franckbret updated the diff for D8033: Arch User Repository (AUR) lister.

Some typo and consistency fixes after code review

Jul 19 2022, 4:32 PM
franckbret added a comment to D8101: crates.lister: Implement incremental mode.

Have you considered using Dulwich (a Git library we already use) instead of shelling out to git? It looks like it would be easier than parsing output from git's UI

Jul 19 2022, 3:57 PM
franckbret added a comment to D8101: crates.lister: Implement incremental mode.
12:00 <+ardumont> franckbret: can you please have a look at my comments in D8033 first?
12:00 -- Notice(swhbot): D8033 (author: franckbret, Needs Review) on swh-lister: Arch User Repository (AUR) lister <https://forge.softwareheritage.org/D8033>
12:02 <+ardumont> (but yeah, sure, for the other one, i'll check before the end of the week)

As vlorentz has done a first pass already, i did not and i trust their judgement ;)

Jul 19 2022, 3:44 PM
franckbret added inline comments to D8101: crates.lister: Implement incremental mode.
Jul 19 2022, 3:41 PM
franckbret updated the diff for D8101: crates.lister: Implement incremental mode.

Make usage of dulwich for to replace some previous subprocess git commands.
Add a test to ensure everything runs fine in incremental mode even if there is no new commit since last lister invocation.

Jul 19 2022, 3:32 PM

Jul 11 2022

franckbret added a comment to D8101: crates.lister: Implement incremental mode.

Lister execution runs fine on first run (non incremental) :

Jul 11 2022, 9:27 AM

Jul 8 2022

franckbret requested review of D8101: crates.lister: Implement incremental mode.
Jul 8 2022, 12:58 PM

Jun 29 2022

franckbret closed D8051: Arch, use **kwargs on task initialisation instead of named args..
Jun 29 2022, 5:20 PM
franckbret committed rDLDBASE01547b8edff5: Arch, use **kwargs on task initialisation instead of named args. (authored by franckbret).
Arch, use **kwargs on task initialisation instead of named args.
Jun 29 2022, 5:20 PM
franckbret updated the diff for D8051: Arch, use **kwargs on task initialisation instead of named args..

Rebase

Jun 29 2022, 5:15 PM
franckbret added a comment to D8052: crates: Remove redundant 'max_content_length' argument.

OOps sorry, forgot this one.
Thanks

Jun 29 2022, 4:14 PM
franckbret requested review of D8051: Arch, use **kwargs on task initialisation instead of named args..
Jun 29 2022, 3:59 PM
franckbret added a comment to T4104: Ingest crates.io (Rust).

Will work on the incremental lister, and then document (not already done).

Jun 29 2022, 3:43 PM · Crates loader, Crates lister, Archive coverage
franckbret closed D8049: Crates loader, use **kwargs on task and loader initialisation instead of named args.
Jun 29 2022, 3:28 PM
franckbret committed rDLDBASE50bde53da53b: Crates loader, use **kwargs on task and loader initialisation instead of (authored by franckbret).
Crates loader, use **kwargs on task and loader initialisation instead of
Jun 29 2022, 3:28 PM
franckbret added a comment to D8033: Arch User Repository (AUR) lister.

but got the bug while testing runner on docker

what was the bug, exactly? I don't see how the new code changes the behavior.

Also, please remove Updating D8033: [WIP] Arch User Repository (AUR) lister from change descriptions; it's redundant and makes the "History" tab of https://forge.softwareheritage.org/D8033#toc less readable

Jun 29 2022, 3:11 PM
franckbret retitled D8033: Arch User Repository (AUR) lister from [WIP] Arch User Repository (AUR) lister to Arch User Repository (AUR) lister.
Jun 29 2022, 3:03 PM
franckbret added a comment to T4104: Ingest crates.io (Rust).

Hello,
The crates lister (stateless) and loader have landed.
I just solved some discovered issues while running lister and loader on the Docker env ( D8049 ).

Jun 29 2022, 3:03 PM · Crates loader, Crates lister, Archive coverage
franckbret requested review of D8049: Crates loader, use **kwargs on task and loader initialisation instead of named args.
Jun 29 2022, 3:01 PM

Jun 28 2022

franckbret updated the diff for D8033: Arch User Repository (AUR) lister.

Updating D8033: [WIP] Arch User Repository (AUR) lister

Jun 28 2022, 12:48 PM

Jun 24 2022

franckbret updated subscribers of D8033: Arch User Repository (AUR) lister.

@ardumont @vlorentz Hi, here is a first implementation of the Arch User Repository (AUR).

Jun 24 2022, 1:52 PM
franckbret requested review of D8033: Arch User Repository (AUR) lister.
Jun 24 2022, 1:00 PM

Jun 17 2022

franckbret closed D7995: Arch Linux loader.
Jun 17 2022, 9:33 AM
franckbret committed rDLDBASEb6af2638c11a: Arch Linux loader (authored by franckbret).
Arch Linux loader
Jun 17 2022, 9:33 AM
franckbret updated the diff for D7995: Arch Linux loader.

Updating D7995: Arch Linux loader

Jun 17 2022, 9:33 AM
franckbret closed D7894: Add arch lister module (origins from archives)..
Jun 17 2022, 9:23 AM
franckbret committed rDLS1bf11aa26d92: Add arch lister module (origins from archives). (authored by franckbret).
Add arch lister module (origins from archives).
Jun 17 2022, 9:23 AM

Jun 16 2022

franckbret retitled D7995: Arch Linux loader from WIP: Arch Linux loader to Arch Linux loader.
Jun 16 2022, 3:39 PM
franckbret updated the diff for D7995: Arch Linux loader.

Updating D7995: Arch Linux loader

Jun 16 2022, 3:32 PM
franckbret updated the diff for D7995: Arch Linux loader.

Updating D7995: WIP: Arch Linux loader

Jun 16 2022, 3:13 PM
franckbret added inline comments to D7995: Arch Linux loader.
Jun 16 2022, 2:25 PM
franckbret added inline comments to D7995: Arch Linux loader.
Jun 16 2022, 2:20 PM
franckbret updated the diff for D7995: Arch Linux loader.

Updating D7995: Arch Linux loader

Jun 16 2022, 1:58 PM
franckbret requested review of D7995: Arch Linux loader.
Jun 16 2022, 1:48 PM
franckbret closed D7993: Uncompress support '.tar.zst' extension and 'application/zstd' mime type..
Jun 16 2022, 12:17 PM
franckbret committed rDCORE87e7afab114d: Add zstandard support to tarball.py for unpacking 'tar.zst' file archives (authored by franckbret).
Add zstandard support to tarball.py for unpacking 'tar.zst' file archives
Jun 16 2022, 12:17 PM
franckbret updated the diff for D7993: Uncompress support '.tar.zst' extension and 'application/zstd' mime type..

Updating D7993: Uncompress support '.tar.zst' extension and 'application/zstd', 'application/x-zstd' mime type.

Jun 16 2022, 12:10 PM
franckbret requested review of D7993: Uncompress support '.tar.zst' extension and 'application/zstd' mime type..
Jun 16 2022, 12:00 PM
franckbret added a revision to T4233: Ingest Arch Linux: D7995: Arch Linux loader.
Jun 16 2022, 9:38 AM · Arch loader, Arch Lister, Archive coverage

Jun 15 2022

franckbret added a comment to D7894: Add arch lister module (origins from archives)..

Archlinux lister execution on Docker runs fine without any error :

Jun 15 2022, 9:47 AM
franckbret updated the diff for D7894: Add arch lister module (origins from archives)..

Updating D7894: Add arch lister module (origins from archives).

Jun 15 2022, 9:13 AM
franckbret updated the diff for D7894: Add arch lister module (origins from archives)..

Updating D7894: Add arch lister module (origins from archives).

Jun 15 2022, 8:38 AM
franckbret updated the diff for D7894: Add arch lister module (origins from archives)..

Updating D7894: Add arch lister module (origins from archives).

Jun 15 2022, 8:21 AM

Jun 14 2022

franckbret retitled D7894: Add arch lister module (origins from archives). from [WIP] Add arch lister module (origins from archives). to Add arch lister module (origins from archives)..
Jun 14 2022, 9:38 AM
franckbret updated the diff for D7894: Add arch lister module (origins from archives)..

Updating D7894: Add arch lister module (origins from archives).

Jun 14 2022, 9:23 AM
franckbret updated the diff for D7894: Add arch lister module (origins from archives)..

Updating D7894: [WIP] Add arch lister module (origins from archives).

Jun 14 2022, 9:08 AM

Jun 13 2022

franckbret updated the diff for D7894: Add arch lister module (origins from archives)..

Updating D7894: [WIP] Add arch lister module (origins from archives).

Jun 13 2022, 11:27 AM
franckbret updated the diff for D7894: Add arch lister module (origins from archives)..

Updating D7894: [WIP] Add arch lister module (origins from archives).

Jun 13 2022, 11:04 AM
franckbret added a comment to D7894: Add arch lister module (origins from archives)..

You can temporarily replace the pytest command with pytest -vv in tox.ini to get the full diff between the two results on Jenkins

Ok thanks! I suspect its related to python version. I'm actually building a new venv with python 3.7.3 to see if I can fail tests locally. Will try your hack if its not the case.

Jun 13 2022, 10:57 AM
franckbret added a comment to D7894: Add arch lister module (origins from archives)..

You can temporarily replace the pytest command with pytest -vv in tox.ini to get the full diff between the two results on Jenkins

Jun 13 2022, 10:50 AM

Jun 10 2022

franckbret updated the diff for D7894: Add arch lister module (origins from archives)..

Updating D7894: [WIP] Add arch lister module (origins from archives).

Jun 10 2022, 3:42 PM
franckbret updated the diff for D7894: Add arch lister module (origins from archives)..

Regenerate data fixtures (jenkins failed on previous commit but tests pass on my machine)

Jun 10 2022, 2:04 PM
franckbret updated the diff for D7894: Add arch lister module (origins from archives)..

Updating D7894: [WIP] Add arch lister module (origins from archives).

Jun 10 2022, 1:09 PM

Jun 3 2022

franckbret added inline comments to D7894: Add arch lister module (origins from archives)..
Jun 3 2022, 11:03 AM

May 31 2022

franckbret added inline comments to D7894: Add arch lister module (origins from archives)..
May 31 2022, 4:52 PM
franckbret updated the diff for D7894: Add arch lister module (origins from archives)..

Updating D7894: [WIP] Add arch lister module (origins from archives).
Various code changes after @vlorentz review

May 31 2022, 4:48 PM

May 27 2022

franckbret updated the diff for D7894: Add arch lister module (origins from archives)..

Updating D7894: [WIP] Add arch lister module (origins from archives).

May 27 2022, 7:02 PM

May 25 2022

franckbret updated subscribers of D7894: Add arch lister module (origins from archives)..

This one is ready for review.

May 25 2022, 3:45 PM
franckbret requested review of D7894: Add arch lister module (origins from archives)..
May 25 2022, 3:38 PM
franckbret abandoned D7812: [WIP] Add arch lister module..

Abandoned in favor of D7894

May 25 2022, 3:36 PM
franckbret added a comment to D7812: [WIP] Add arch lister module..

I've made several experiments in order to find a better way to list arch linux
package.

The most efficient way I've found is to download tar.gz files which contains package
name as directory and a "desc" file with easy to parse metadata. It works fine but
retrieve only the latest version of a package.

Here are some time execution metrics for downloading archive and parse desc files.

Found 266 packages from https://archive.archlinux.org/repos/last/core/os/x86_64/core.files.tar.gz in 1.4924319160054438 seconds

Found 3035 packages from https://archive.archlinux.org/repos/last/extra/os/x86_64/extra.files.tar.gz in 5.644616681995103 seconds

Found 9161 packages from https://archive.archlinux.org/repos/last/community/os/x86_64/community.files.tar.gz in 16.14458583202213 seconds

Example of retrieved package data after parsing:

{'arch': 'x86_64',
 'repo': 'core',
 'base': 'acl',
 'builddate': '1643730617',
 'conflicts': 'xfsacl',
 'csize': '138970',
 'desc': 'Access control list utilities, libraries and headers',
 'filename': 'acl-2.3.1-2-x86_64.pkg.tar.zst',
 'isize': '325349',
 'license': 'LGPL',
 'md5sum': '718c93159ce4dfc6f789ffe27ce276e8',
 'name': 'acl',
 'packager': 'Christian Hesse <eworm@archlinux.org>',
 'pgpsig': 'iHUEABYIAB0WIQQEKYl95fO9rFN6MGltQr3RFuAGjwUCYflW2QAKCRBtQr3RFuAGj/waAP9U7gJZ0YRfftuGdc4shJdSIfspuWb3nZK+fj7My5z4zQD/SBpepSM3Cxr8Pw2LU5adq4UI0HWFZFsHrg3179XJqgI=',
 'project_url': 'https://savannah.nongnu.org/projects/acl',
 'replaces': 'xfsacl',
 'sha256sum': '20873a994a0728de5b05857129c290e9a8c9bba2236cc30bcffa7b746ffe9218',
 'url': 'https://archive.archlinux.org/packages/.all/acl-2.3.1-2-x86_64.pkg.tar.zst',
 'version': '2.3.1-2'}

If we are ok to get only latest version, we can go this way.

(as a data point) That's currently the way we are retrieving information for CRAN
packages. CRAN (infra) only exposes the latest version of a package (it exposes archived
versions with a dedicated instance we are not currently listing).

But our lister is listing them everyday so from the moment we started ingested them, we
should have some versions for one package already. At some point, we'll have to attend
to the archived ones as well.

So I guess, given your current experiments reported here (through the description and
this very comment), it'd be ok to do the same than CRAN here.

Nonetheless, it's possible to get other versions of a package through two different
strategies, each with some pros and cons:

  1. Download index https://archive.archlinux.org/packages/.all/index.0.xz which contains a file that list several previous versions, for example:
mercurial-4.8.2-1-x86_64
mercurial-4.9-1-x86_64
mercurial-4.9.1-1-x86_64
mercurial-5.0-1-x86_64
...

Pro : One 500 ko file to download, one dynamic regex to find matches
Cons: we only get a filename, no date, no metadata. The files is +/- 400000 entries.
It tooks 16 min for the regex to find match for +/- 15000 packages...

  1. Scrap server directory listing to get previous versions of a package with its

release date, for example https://archive.archlinux.org/packages/m/mercurial/
Pro: Easy to scrap + a release date is associated to a version
Cons: Scrapping +/- 15000 pages can be quite slow, no metadata

@vlorentz @ardumont @bchauvet what do you think, what do you prefer?

As mentioned, I'd go for the simplest solution (first one which allows more simple
metadata retrieval for the latest version only).

@vlorentz @bchauvet thoughts?

Also do you I cancel that issue and create a new one to go on?

You can go either way. If you keep that one, it'd be easier to compare with your future
version (and the future review will be simpler, no noisy old comments). If you keep it,
we can still find its initial version through the history tab (within the web ui).

Well, go simple, create a new one? (yeah, the opposite of what i said to you on irc on
friday ¯\_(ツ)_/¯ ;)

Cheers,

May 25 2022, 3:11 PM

May 18 2022

franckbret updated subscribers of D7812: [WIP] Add arch lister module..

I've made several experiments in order to find a better way to list arch linux package.

May 18 2022, 11:26 AM

May 11 2022

franckbret updated the diff for D7812: [WIP] Add arch lister module..

Updating D7812: [WIP] Add arch lister module.

May 11 2022, 3:22 PM
franckbret requested review of D7812: [WIP] Add arch lister module..
May 11 2022, 3:06 PM

May 4 2022

franckbret closed D7713: crates: rework to take advantage of data returned by the crates lister.
May 4 2022, 10:31 AM
franckbret committed rDLDBASEa097a946c2f2: crates: rework to take advantage of data returned by the crates lister (authored by franckbret).
crates: rework to take advantage of data returned by the crates lister
May 4 2022, 10:31 AM
franckbret updated the diff for D7713: crates: rework to take advantage of data returned by the crates lister.

Updating D7713: crates: rework to take advantage of data returned by the crates lister

May 4 2022, 10:07 AM

Apr 29 2022

franckbret added inline comments to D7713: crates: rework to take advantage of data returned by the crates lister.
Apr 29 2022, 3:56 PM
franckbret updated the diff for D7713: crates: rework to take advantage of data returned by the crates lister.

Update back the diff

Apr 29 2022, 10:42 AM
franckbret updated the diff for D7713: crates: rework to take advantage of data returned by the crates lister.

Add 'Franck Bret' to contributors

Apr 29 2022, 10:38 AM
franckbret requested review of D7713: crates: rework to take advantage of data returned by the crates lister.
Apr 29 2022, 10:17 AM
franckbret added a revision to T4104: Ingest crates.io (Rust): D7713: crates: rework to take advantage of data returned by the crates lister.
Apr 29 2022, 10:14 AM · Crates loader, Crates lister, Archive coverage

Apr 28 2022

franckbret closed D7654: crates: create one origin per package instead of per version.
Apr 28 2022, 4:11 PM
franckbret committed rDLS985b71e80c66: crates: Create one origin per package instead of per version (authored by franckbret).
crates: Create one origin per package instead of per version
Apr 28 2022, 4:11 PM
franckbret updated the diff for D7654: crates: create one origin per package instead of per version.

prepare to push

Apr 28 2022, 4:11 PM
franckbret closed D7501: Rust lang, Crates loader.
Apr 28 2022, 4:09 PM
franckbret committed rDLDBASE2e27f7c7a697: Rust lang, Crates loader (authored by franckbret).
Rust lang, Crates loader
Apr 28 2022, 4:09 PM
franckbret updated the diff for D7501: Rust lang, Crates loader.

Updating D7501: Rust lang, Crates loader

Apr 28 2022, 4:09 PM
franckbret retitled D7654: crates: create one origin per package instead of per version from Crates.io lister, create one origin per package instead of per version to crates: create one origin per package instead of per version.
Apr 28 2022, 9:06 AM
franckbret updated the diff for D7654: crates: create one origin per package instead of per version.

Updating D7654: crates, create one origin per package instead of per version

Apr 28 2022, 9:06 AM

Apr 27 2022

franckbret updated the diff for D7654: crates: create one origin per package instead of per version.

Add missing "v1" in http api url

Apr 27 2022, 5:48 PM
franckbret updated the diff for D7654: crates: create one origin per package instead of per version.

Added 'version' key to 'artifacts' dict.

Apr 27 2022, 5:27 PM
franckbret added a comment to D7501: Rust lang, Crates loader.

Proposed evolution on crate lister to be consistent with the loader here :

https://forge.softwareheritage.org/D7654

Nice, and now the loader needs some rework to match that diff's adaptations ;)

Apr 27 2022, 4:10 PM
franckbret added a comment to D7654: crates: create one origin per package instead of per version.

I've run the lister in the docker environment, it looks good.

Apr 27 2022, 3:39 PM
franckbret retitled D7654: crates: create one origin per package instead of per version from Refactor crates.io lister to Crates.io lister, create one origin per package instead of per version.
Apr 27 2022, 2:34 PM