Page MenuHomeSoftware Heritage

franckbret (Franck Bret)
User

Projects

User does not belong to any projects.

User Details

User Since
Mar 17 2022, 11:24 AM (27 w, 5 d)

Recent Activity

Yesterday

franckbret closed D8528: Nuget: Lister for NuGet the package manager for .NET.
Tue, Sep 27, 4:28 PM
franckbret committed rDLS3928fc9ee9ff: Nuget: Lister for NuGet the package manager for .NET (authored by franckbret).
Nuget: Lister for NuGet the package manager for .NET
Tue, Sep 27, 4:28 PM
franckbret updated the diff for D8528: Nuget: Lister for NuGet the package manager for .NET.

rebase

Tue, Sep 27, 4:01 PM
franckbret added a comment to D8529: RubyGems: List origins from https://rubygems.org.

Is https://rubygems.org/versions documented somewhere?

Tue, Sep 27, 3:59 PM
franckbret updated the diff for D8529: RubyGems: List origins from https://rubygems.org.

Make use of http_request after D8520, update documentation docker section.

Tue, Sep 27, 3:23 PM
franckbret updated the diff for D8528: Nuget: Lister for NuGet the package manager for .NET.

Explain that the lister discovers origins on other forges because NuGet packages are binaries

Tue, Sep 27, 2:59 PM
franckbret closed D8531: Puppet: Lister for Puppet modules.
Tue, Sep 27, 2:46 PM
franckbret committed rDLScd596eb2b46d: Puppet: Lister for Puppet modules (authored by franckbret).
Puppet: Lister for Puppet modules
Tue, Sep 27, 2:46 PM
franckbret updated the diff for D8531: Puppet: Lister for Puppet modules.

rebase

Tue, Sep 27, 2:45 PM
franckbret added a comment to D8542: Cpan: List Perl module origins from cpan.org.

@franckbret , have you considered exploiting the https://fastapi.metacpan.org/v1/release/_search endpoint of the CPAN elasticsearch ?

It seems to list all CPAN releases with dates, links to tarballs and checksums. You could build a list of artifacts for each package as in the crates loader
and pass them as loader arguments.

Thanks for the review.

Tue, Sep 27, 2:42 PM
franckbret closed D8542: Cpan: List Perl module origins from cpan.org.
Tue, Sep 27, 2:32 PM
franckbret committed rDLSa4aec3894e3c: Cpan: List Perl module origins from cpan.org (authored by franckbret).
Cpan: List Perl module origins from cpan.org
Tue, Sep 27, 2:32 PM
franckbret updated the diff for D8542: Cpan: List Perl module origins from cpan.org.

rebase

Tue, Sep 27, 2:31 PM
franckbret closed D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.
Tue, Sep 27, 2:25 PM
franckbret committed rDLS6696a8424ad1: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository (authored by franckbret).
Hackage: List origins from hackage.haskell.org, The Haskell Package Repository
Tue, Sep 27, 2:25 PM
franckbret updated the diff for D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

rebase

Tue, Sep 27, 2:24 PM
franckbret closed D8517: Conda: List origins from anaconda.com, the Package, dependency and environment management for any language.
Tue, Sep 27, 2:19 PM
franckbret committed rDLS8ff418fbc28b: Conda: List origins for Anaconda, the package manager that provides tooling for… (authored by franckbret).
Conda: List origins for Anaconda, the package manager that provides tooling for…
Tue, Sep 27, 2:19 PM
franckbret updated the diff for D8379: Hackage: Loads Hackage Listed origins.

Replace api_info that has been renamed to get_url_body

Tue, Sep 27, 10:55 AM
franckbret requested review of D8542: Cpan: List Perl module origins from cpan.org.
Tue, Sep 27, 9:32 AM

Mon, Sep 26

franckbret updated the diff for D8531: Puppet: Lister for Puppet modules.

Sphinx fix

Mon, Sep 26, 6:53 PM
franckbret added inline comments to D8531: Puppet: Lister for Puppet modules.
Mon, Sep 26, 6:47 PM
franckbret updated the diff for D8531: Puppet: Lister for Puppet modules.

typo

Mon, Sep 26, 6:44 PM
franckbret updated the diff for D8531: Puppet: Lister for Puppet modules.

Make use of self.http_request as introduced by D8520

Mon, Sep 26, 6:38 PM
franckbret updated the diff for D8517: Conda: List origins from anaconda.com, the Package, dependency and environment management for any language.

Make use of http_request after D8520

Mon, Sep 26, 5:59 PM
franckbret updated the diff for D8528: Nuget: Lister for NuGet the package manager for .NET.

Update docker usage documentation section and remove some useless code

Mon, Sep 26, 5:23 PM
franckbret added inline comments to D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.
Mon, Sep 26, 5:06 PM
franckbret updated the diff for D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

Improvments after review

Mon, Sep 26, 5:04 PM
franckbret updated subscribers of D8528: Nuget: Lister for NuGet the package manager for .NET.
Mon, Sep 26, 4:16 PM
franckbret updated the diff for D8528: Nuget: Lister for NuGet the package manager for .NET.

More complete tests and basic documentation

Mon, Sep 26, 4:10 PM
franckbret updated the diff for D8528: Nuget: Lister for NuGet the package manager for .NET.

Make use of generic http_request method after D8520

Mon, Sep 26, 2:38 PM
franckbret updated the diff for D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

Make use of http_retry instead of throttling_retry decorator after D8519

Mon, Sep 26, 12:27 PM
franckbret updated the diff for D8517: Conda: List origins from anaconda.com, the Package, dependency and environment management for any language.

Make use of http_retry instead of throttling_retry decorator after D8519

Mon, Sep 26, 12:17 PM
franckbret retitled D8528: Nuget: Lister for NuGet the package manager for .NET from [WIP] Nuget: Lister for NuGet the package manager for .NET to Nuget: Lister for NuGet the package manager for .NET.
Mon, Sep 26, 12:14 PM
franckbret updated the diff for D8528: Nuget: Lister for NuGet the package manager for .NET.

Make use of http_retry instead of throttling_retry decorator after D8519

Mon, Sep 26, 12:13 PM
franckbret retitled D8529: RubyGems: List origins from https://rubygems.org from [WIP] RubyGems: List origins from https://rubygems.org to RubyGems: List origins from https://rubygems.org.
Mon, Sep 26, 12:10 PM
franckbret updated the diff for D8529: RubyGems: List origins from https://rubygems.org.

Make use of http_retry instead of throttling_retry decorator after D8519

Mon, Sep 26, 12:10 PM
franckbret requested review of D8531: Puppet: Lister for Puppet modules.
Mon, Sep 26, 12:06 PM

Fri, Sep 23

franckbret requested review of D8529: RubyGems: List origins from https://rubygems.org.
Fri, Sep 23, 8:10 PM
franckbret requested review of D8528: Nuget: Lister for NuGet the package manager for .NET.
Fri, Sep 23, 5:56 PM
franckbret added a comment to D8517: Conda: List origins from anaconda.com, the Package, dependency and environment management for any language.

Some test report after running the lister (lister only, no loader yet) with following command:

Fri, Sep 23, 9:53 AM
franckbret updated the diff for D8517: Conda: List origins from anaconda.com, the Package, dependency and environment management for any language.

Doc: Add missing quote to docker command example to add a conda task

Fri, Sep 23, 9:40 AM
franckbret updated the diff for D8517: Conda: List origins from anaconda.com, the Package, dependency and environment management for any language.

Add missing documentation link

Fri, Sep 23, 9:26 AM
franckbret updated the diff for D8517: Conda: List origins from anaconda.com, the Package, dependency and environment management for any language.

Add documentation for lister usage

Fri, Sep 23, 9:11 AM

Thu, Sep 22

franckbret retitled D8517: Conda: List origins from anaconda.com, the Package, dependency and environment management for any language from [WIP] Conda: List origins from anaconda.com, the Package, dependency and environment management for any language to Conda: List origins from anaconda.com, the Package, dependency and environment management for any language.
Thu, Sep 22, 7:51 PM
franckbret updated the diff for D8517: Conda: List origins from anaconda.com, the Package, dependency and environment management for any language.

Lister improvment

Thu, Sep 22, 7:44 PM

Wed, Sep 21

franckbret requested review of D8517: Conda: List origins from anaconda.com, the Package, dependency and environment management for any language.
Wed, Sep 21, 2:59 PM

Mon, Sep 19

franckbret updated the diff for D8171: crates: Loader implements incremental mode.

Update the patch to make it work with new lister patch that give a last_update value for each versions

Mon, Sep 19, 10:51 AM

Fri, Sep 16

franckbret updated the diff for D8454: Crates.io: Add last_update for each version of a crate.

Update documentation

Fri, Sep 16, 8:41 AM

Thu, Sep 15

franckbret added a comment to D8454: Crates.io: Add last_update for each version of a crate.

Also I think it can be doable to totally remove the GIT part of the lister. The csv files have everything we need. For the incremental part a metadata.json file at the root of the archive a date and a commit hash that represents the date of the db dump.
In incremental case the lister can compare that date to the last update of the crate.

I also think getting rid of the git part would be a good idea.

By testing that diff in docker, I quickly got an error as the git repository contains more recent crate versions
as those extracted from the db dump, see below (ftr, I added some debug logs):

docker-swh-lister-1  | [2022-09-13 14:48:59,849: INFO/MainProcess] Task swh.lister.crates.tasks.CratesListerTask[d88bb21b-1613-4230-b4ec-5bdd5092982c] received
docker-swh-lister-1  | [2022-09-13 14:48:59,851: DEBUG/ForkPoolWorker-1] Loading config file /lister.yml
docker-swh-lister-1  | Enumerating objects: 158660, done.
Counting objects: 100% (1216/1216), done.  0% (1/1216)
Compressing objects: 100% (601/601), done.:   0% (1/601)
docker-swh-lister-1  | Total 158660 (delta 715), reused 1090 (delta 589), pack-reused 157444
docker-swh-lister-1  | [2022-09-13 14:51:14,659: DEBUG/ForkPoolWorker-1] Found 25 crates in crates_index
docker-swh-lister-1  | [2022-09-13 14:51:14,660: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'vers': '0.0.1-dev.0', 'deps': [{'name': 'hex', 'req': '^0.4.3', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'normal'}, {'name': 'rand', 'req': '^0.8.5', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'normal'}], 'cksum': '878b6701a5ab722ef3c30f2af1a25539c50d83c97da98941998c684b0f5c52cd', 'features': {}, 'yanked': False, 'links': None}
docker-swh-lister-1  | [2022-09-13 14:51:14,660: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'updated_at': '2022-09-11 12:08:17.012383', 'versions': {'0.0.1-dev.0': OrderedDict([('checksum', '878b6701a5ab722ef3c30f2af1a25539c50d83c97da98941998c684b0f5c52cd'), ('crate_id', '653834'), ('crate_size', '2821'), ('created_at', '2022-08-28 11:43:09.333693'), ('downloads', '22'), ('features', '{}'), ('id', '610177'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.0'), ('published_by', '163342'), ('updated_at', '2022-08-28 11:43:09.333693'), ('yanked', 'f')]), '0.0.1': OrderedDict([('checksum', '1cb7dccc5e4128b4ebe8c46ca29e440e52bbca4daad5dcea864a74f25dcee0ce'), ('crate_id', '653834'), ('crate_size', '7999'), ('created_at', '2022-09-11 12:08:17.012383'), ('downloads', '12'), ('features', '{}'), ('id', '618675'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1'), ('published_by', '163342'), ('updated_at', '2022-09-11 12:08:17.012383'), ('yanked', 'f')]), '0.0.1-dev.2': OrderedDict([('checksum', '215f42225dffe2a135d1480662d379620445628b4bfe17aee56a20cf0d4590ce'), ('crate_id', '653834'), ('crate_size', '7547'), ('created_at', '2022-08-31 16:02:07.17775'), ('downloads', '23'), ('features', '{}'), ('id', '611995'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.2'), ('published_by', '163342'), ('updated_at', '2022-08-31 16:02:07.17775'), ('yanked', 'f')]), '0.0.1-dev.1': OrderedDict([('checksum', '2d5fb208766898bb8dcf9c2e270143b5b71b6271698d45ac86dc4d3e97ef178e'), ('crate_id', '653834'), ('crate_size', '2940'), ('created_at', '2022-08-29 15:32:43.066959'), ('downloads', '23'), ('features', '{}'), ('id', '610858'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.1'), ('published_by', '163342'), ('updated_at', '2022-08-29 15:32:43.066959'), ('yanked', 'f')])}}
docker-swh-lister-1  | [2022-09-13 14:51:14,660: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'vers': '0.0.1-dev.1', 'deps': [{'name': 'hex', 'req': '^0.4.3', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'normal'}, {'name': 'rand', 'req': '^0.8.5', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'normal'}], 'cksum': '2d5fb208766898bb8dcf9c2e270143b5b71b6271698d45ac86dc4d3e97ef178e', 'features': {}, 'yanked': False, 'links': None}
docker-swh-lister-1  | [2022-09-13 14:51:14,660: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'updated_at': '2022-09-11 12:08:17.012383', 'versions': {'0.0.1-dev.0': OrderedDict([('checksum', '878b6701a5ab722ef3c30f2af1a25539c50d83c97da98941998c684b0f5c52cd'), ('crate_id', '653834'), ('crate_size', '2821'), ('created_at', '2022-08-28 11:43:09.333693'), ('downloads', '22'), ('features', '{}'), ('id', '610177'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.0'), ('published_by', '163342'), ('updated_at', '2022-08-28 11:43:09.333693'), ('yanked', 'f')]), '0.0.1': OrderedDict([('checksum', '1cb7dccc5e4128b4ebe8c46ca29e440e52bbca4daad5dcea864a74f25dcee0ce'), ('crate_id', '653834'), ('crate_size', '7999'), ('created_at', '2022-09-11 12:08:17.012383'), ('downloads', '12'), ('features', '{}'), ('id', '618675'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1'), ('published_by', '163342'), ('updated_at', '2022-09-11 12:08:17.012383'), ('yanked', 'f')]), '0.0.1-dev.2': OrderedDict([('checksum', '215f42225dffe2a135d1480662d379620445628b4bfe17aee56a20cf0d4590ce'), ('crate_id', '653834'), ('crate_size', '7547'), ('created_at', '2022-08-31 16:02:07.17775'), ('downloads', '23'), ('features', '{}'), ('id', '611995'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.2'), ('published_by', '163342'), ('updated_at', '2022-08-31 16:02:07.17775'), ('yanked', 'f')]), '0.0.1-dev.1': OrderedDict([('checksum', '2d5fb208766898bb8dcf9c2e270143b5b71b6271698d45ac86dc4d3e97ef178e'), ('crate_id', '653834'), ('crate_size', '2940'), ('created_at', '2022-08-29 15:32:43.066959'), ('downloads', '23'), ('features', '{}'), ('id', '610858'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.1'), ('published_by', '163342'), ('updated_at', '2022-08-29 15:32:43.066959'), ('yanked', 'f')])}}
docker-swh-lister-1  | [2022-09-13 14:51:14,661: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'vers': '0.0.1-dev.2', 'deps': [{'name': 'getrandom', 'req': '^0.2', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'normal'}, {'name': 'criterion', 'req': '^0.3', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}, {'name': 'nanoid', 'req': '^0.4.0', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}, {'name': 'uuid', 'req': '^1.1.2', 'features': ['v4', 'rng'], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}], 'cksum': '215f42225dffe2a135d1480662d379620445628b4bfe17aee56a20cf0d4590ce', 'features': {}, 'yanked': False, 'links': None}
docker-swh-lister-1  | [2022-09-13 14:51:14,661: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'updated_at': '2022-09-11 12:08:17.012383', 'versions': {'0.0.1-dev.0': OrderedDict([('checksum', '878b6701a5ab722ef3c30f2af1a25539c50d83c97da98941998c684b0f5c52cd'), ('crate_id', '653834'), ('crate_size', '2821'), ('created_at', '2022-08-28 11:43:09.333693'), ('downloads', '22'), ('features', '{}'), ('id', '610177'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.0'), ('published_by', '163342'), ('updated_at', '2022-08-28 11:43:09.333693'), ('yanked', 'f')]), '0.0.1': OrderedDict([('checksum', '1cb7dccc5e4128b4ebe8c46ca29e440e52bbca4daad5dcea864a74f25dcee0ce'), ('crate_id', '653834'), ('crate_size', '7999'), ('created_at', '2022-09-11 12:08:17.012383'), ('downloads', '12'), ('features', '{}'), ('id', '618675'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1'), ('published_by', '163342'), ('updated_at', '2022-09-11 12:08:17.012383'), ('yanked', 'f')]), '0.0.1-dev.2': OrderedDict([('checksum', '215f42225dffe2a135d1480662d379620445628b4bfe17aee56a20cf0d4590ce'), ('crate_id', '653834'), ('crate_size', '7547'), ('created_at', '2022-08-31 16:02:07.17775'), ('downloads', '23'), ('features', '{}'), ('id', '611995'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.2'), ('published_by', '163342'), ('updated_at', '2022-08-31 16:02:07.17775'), ('yanked', 'f')]), '0.0.1-dev.1': OrderedDict([('checksum', '2d5fb208766898bb8dcf9c2e270143b5b71b6271698d45ac86dc4d3e97ef178e'), ('crate_id', '653834'), ('crate_size', '2940'), ('created_at', '2022-08-29 15:32:43.066959'), ('downloads', '23'), ('features', '{}'), ('id', '610858'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.1'), ('published_by', '163342'), ('updated_at', '2022-08-29 15:32:43.066959'), ('yanked', 'f')])}}
docker-swh-lister-1  | [2022-09-13 14:51:14,661: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'vers': '0.0.1', 'deps': [{'name': 'getrandom', 'req': '^0.2', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'normal'}, {'name': 'criterion', 'req': '^0.4.0', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}, {'name': 'nanoid', 'req': '^0.4.0', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}, {'name': 'uuid', 'req': '^1.1.2', 'features': ['v4', 'rng'], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}], 'cksum': '1cb7dccc5e4128b4ebe8c46ca29e440e52bbca4daad5dcea864a74f25dcee0ce', 'features': {}, 'yanked': False, 'links': None}
docker-swh-lister-1  | [2022-09-13 14:51:14,661: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'updated_at': '2022-09-11 12:08:17.012383', 'versions': {'0.0.1-dev.0': OrderedDict([('checksum', '878b6701a5ab722ef3c30f2af1a25539c50d83c97da98941998c684b0f5c52cd'), ('crate_id', '653834'), ('crate_size', '2821'), ('created_at', '2022-08-28 11:43:09.333693'), ('downloads', '22'), ('features', '{}'), ('id', '610177'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.0'), ('published_by', '163342'), ('updated_at', '2022-08-28 11:43:09.333693'), ('yanked', 'f')]), '0.0.1': OrderedDict([('checksum', '1cb7dccc5e4128b4ebe8c46ca29e440e52bbca4daad5dcea864a74f25dcee0ce'), ('crate_id', '653834'), ('crate_size', '7999'), ('created_at', '2022-09-11 12:08:17.012383'), ('downloads', '12'), ('features', '{}'), ('id', '618675'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1'), ('published_by', '163342'), ('updated_at', '2022-09-11 12:08:17.012383'), ('yanked', 'f')]), '0.0.1-dev.2': OrderedDict([('checksum', '215f42225dffe2a135d1480662d379620445628b4bfe17aee56a20cf0d4590ce'), ('crate_id', '653834'), ('crate_size', '7547'), ('created_at', '2022-08-31 16:02:07.17775'), ('downloads', '23'), ('features', '{}'), ('id', '611995'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.2'), ('published_by', '163342'), ('updated_at', '2022-08-31 16:02:07.17775'), ('yanked', 'f')]), '0.0.1-dev.1': OrderedDict([('checksum', '2d5fb208766898bb8dcf9c2e270143b5b71b6271698d45ac86dc4d3e97ef178e'), ('crate_id', '653834'), ('crate_size', '2940'), ('created_at', '2022-08-29 15:32:43.066959'), ('downloads', '23'), ('features', '{}'), ('id', '610858'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.1'), ('published_by', '163342'), ('updated_at', '2022-08-29 15:32:43.066959'), ('yanked', 'f')])}}
docker-swh-lister-1  | [2022-09-13 14:51:14,661: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'vers': '0.0.2', 'deps': [{'name': 'getrandom', 'req': '^0.2', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'normal'}, {'name': 'criterion', 'req': '^0.4.0', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}, {'name': 'nanoid', 'req': '^0.4.0', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}, {'name': 'uuid', 'req': '^1.1.2', 'features': ['v4', 'rng'], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}], 'cksum': 'c25097f191e32ad6550e402f6c5e6fbae7115a60bfedea2a4f5351c16a286229', 'features': {}, 'yanked': False, 'links': None}
docker-swh-lister-1  | [2022-09-13 14:51:14,661: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'updated_at': '2022-09-11 12:08:17.012383', 'versions': {'0.0.1-dev.0': OrderedDict([('checksum', '878b6701a5ab722ef3c30f2af1a25539c50d83c97da98941998c684b0f5c52cd'), ('crate_id', '653834'), ('crate_size', '2821'), ('created_at', '2022-08-28 11:43:09.333693'), ('downloads', '22'), ('features', '{}'), ('id', '610177'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.0'), ('published_by', '163342'), ('updated_at', '2022-08-28 11:43:09.333693'), ('yanked', 'f')]), '0.0.1': OrderedDict([('checksum', '1cb7dccc5e4128b4ebe8c46ca29e440e52bbca4daad5dcea864a74f25dcee0ce'), ('crate_id', '653834'), ('crate_size', '7999'), ('created_at', '2022-09-11 12:08:17.012383'), ('downloads', '12'), ('features', '{}'), ('id', '618675'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1'), ('published_by', '163342'), ('updated_at', '2022-09-11 12:08:17.012383'), ('yanked', 'f')]), '0.0.1-dev.2': OrderedDict([('checksum', '215f42225dffe2a135d1480662d379620445628b4bfe17aee56a20cf0d4590ce'), ('crate_id', '653834'), ('crate_size', '7547'), ('created_at', '2022-08-31 16:02:07.17775'), ('downloads', '23'), ('features', '{}'), ('id', '611995'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.2'), ('published_by', '163342'), ('updated_at', '2022-08-31 16:02:07.17775'), ('yanked', 'f')]), '0.0.1-dev.1': OrderedDict([('checksum', '2d5fb208766898bb8dcf9c2e270143b5b71b6271698d45ac86dc4d3e97ef178e'), ('crate_id', '653834'), ('crate_size', '2940'), ('created_at', '2022-08-29 15:32:43.066959'), ('downloads', '23'), ('features', '{}'), ('id', '610858'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.1'), ('published_by', '163342'), ('updated_at', '2022-08-29 15:32:43.066959'), ('yanked', 'f')])}}
docker-swh-lister-1  | [2022-09-13 14:51:14,662: DEBUG/ForkPoolWorker-1] Listing crates origin completed with last commit id 81cd3beb5d62f3b898607ab5b266a856b0e9fab8
docker-swh-lister-1  | [2022-09-13 14:51:17,965: DEBUG/ForkPoolWorker-1] Successfully removed /tmp/crates.io-index directory
docker-swh-lister-1  | [2022-09-13 14:51:18,058: DEBUG/ForkPoolWorker-1] Successfully removed /tmp/crates.io-db_dump directory
docker-swh-lister-1  | [2022-09-13 14:51:18,066: ERROR/ForkPoolWorker-1] Task swh.lister.crates.tasks.CratesListerTask[d88bb21b-1613-4230-b4ec-5bdd5092982c] raised unexpected: KeyError('0.0.2')
docker-swh-lister-1  | Traceback (most recent call last):
docker-swh-lister-1  |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 451, in trace_task
docker-swh-lister-1  |     R = retval = fun(*args, **kwargs)
docker-swh-lister-1  |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/scheduler/task.py", line 61, in __call__
docker-swh-lister-1  |     result = super().__call__(*args, **kwargs)
docker-swh-lister-1  |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 734, in __protected_call__
docker-swh-lister-1  |     return self.run(*args, **kwargs)
docker-swh-lister-1  |   File "/src/swh-lister/swh/lister/crates/tasks.py", line 14, in list_crates
docker-swh-lister-1  |     return CratesLister.from_configfile(**lister_args).run().dict()
docker-swh-lister-1  |   File "/src/swh-lister/swh/lister/pattern.py", line 127, in run
docker-swh-lister-1  |     for page in self.get_pages():
docker-swh-lister-1  |   File "/src/swh-lister/swh/lister/crates/lister.py", line 245, in get_pages
docker-swh-lister-1  |     entry["version"]
docker-swh-lister-1  | KeyError: '0.0.2'

Working only with the CSV files should guarantee crates data are consistent.

Thu, Sep 15, 6:53 PM
franckbret added a comment to D8454: Crates.io: Add last_update for each version of a crate.

Didn't test this one yet on docker but guessed this situation. The backup is generated everyday or so, the git repo changes everyday.

Do they document at what time of day it is generated? Would be nice to run the lister right after to minimize lag

Thu, Sep 15, 6:49 PM
franckbret added inline comments to D8454: Crates.io: Add last_update for each version of a crate.
Thu, Sep 15, 6:43 PM
franckbret updated the diff for D8454: Crates.io: Add last_update for each version of a crate.

Use csv listing only

Thu, Sep 15, 6:20 PM

Wed, Sep 14

franckbret closed D8455: Pubdev: Ensure we correctly parse and sort release versions.
Wed, Sep 14, 6:23 AM
franckbret committed rDLDBASE134087342b42: Pubdev: Ensure we correctly parse and sort release versions (authored by franckbret).
Pubdev: Ensure we correctly parse and sort release versions
Wed, Sep 14, 6:23 AM

Tue, Sep 13

franckbret added a comment to D8454: Crates.io: Add last_update for each version of a crate.

Also I think it can be doable to totally remove the GIT part of the lister. The csv files have everything we need. For the incremental part a metadata.json file at the root of the archive a date and a commit hash that represents the date of the db dump.
In incremental case the lister can compare that date to the last update of the crate.

I also think getting rid of the git part would be a good idea.

By testing that diff in docker, I quickly got an error as the git repository contains more recent crate versions
as those extracted from the db dump, see below (ftr, I added some debug logs):

docker-swh-lister-1  | [2022-09-13 14:48:59,849: INFO/MainProcess] Task swh.lister.crates.tasks.CratesListerTask[d88bb21b-1613-4230-b4ec-5bdd5092982c] received
docker-swh-lister-1  | [2022-09-13 14:48:59,851: DEBUG/ForkPoolWorker-1] Loading config file /lister.yml
docker-swh-lister-1  | Enumerating objects: 158660, done.
Counting objects: 100% (1216/1216), done.  0% (1/1216)
Compressing objects: 100% (601/601), done.:   0% (1/601)
docker-swh-lister-1  | Total 158660 (delta 715), reused 1090 (delta 589), pack-reused 157444
docker-swh-lister-1  | [2022-09-13 14:51:14,659: DEBUG/ForkPoolWorker-1] Found 25 crates in crates_index
docker-swh-lister-1  | [2022-09-13 14:51:14,660: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'vers': '0.0.1-dev.0', 'deps': [{'name': 'hex', 'req': '^0.4.3', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'normal'}, {'name': 'rand', 'req': '^0.8.5', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'normal'}], 'cksum': '878b6701a5ab722ef3c30f2af1a25539c50d83c97da98941998c684b0f5c52cd', 'features': {}, 'yanked': False, 'links': None}
docker-swh-lister-1  | [2022-09-13 14:51:14,660: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'updated_at': '2022-09-11 12:08:17.012383', 'versions': {'0.0.1-dev.0': OrderedDict([('checksum', '878b6701a5ab722ef3c30f2af1a25539c50d83c97da98941998c684b0f5c52cd'), ('crate_id', '653834'), ('crate_size', '2821'), ('created_at', '2022-08-28 11:43:09.333693'), ('downloads', '22'), ('features', '{}'), ('id', '610177'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.0'), ('published_by', '163342'), ('updated_at', '2022-08-28 11:43:09.333693'), ('yanked', 'f')]), '0.0.1': OrderedDict([('checksum', '1cb7dccc5e4128b4ebe8c46ca29e440e52bbca4daad5dcea864a74f25dcee0ce'), ('crate_id', '653834'), ('crate_size', '7999'), ('created_at', '2022-09-11 12:08:17.012383'), ('downloads', '12'), ('features', '{}'), ('id', '618675'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1'), ('published_by', '163342'), ('updated_at', '2022-09-11 12:08:17.012383'), ('yanked', 'f')]), '0.0.1-dev.2': OrderedDict([('checksum', '215f42225dffe2a135d1480662d379620445628b4bfe17aee56a20cf0d4590ce'), ('crate_id', '653834'), ('crate_size', '7547'), ('created_at', '2022-08-31 16:02:07.17775'), ('downloads', '23'), ('features', '{}'), ('id', '611995'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.2'), ('published_by', '163342'), ('updated_at', '2022-08-31 16:02:07.17775'), ('yanked', 'f')]), '0.0.1-dev.1': OrderedDict([('checksum', '2d5fb208766898bb8dcf9c2e270143b5b71b6271698d45ac86dc4d3e97ef178e'), ('crate_id', '653834'), ('crate_size', '2940'), ('created_at', '2022-08-29 15:32:43.066959'), ('downloads', '23'), ('features', '{}'), ('id', '610858'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.1'), ('published_by', '163342'), ('updated_at', '2022-08-29 15:32:43.066959'), ('yanked', 'f')])}}
docker-swh-lister-1  | [2022-09-13 14:51:14,660: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'vers': '0.0.1-dev.1', 'deps': [{'name': 'hex', 'req': '^0.4.3', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'normal'}, {'name': 'rand', 'req': '^0.8.5', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'normal'}], 'cksum': '2d5fb208766898bb8dcf9c2e270143b5b71b6271698d45ac86dc4d3e97ef178e', 'features': {}, 'yanked': False, 'links': None}
docker-swh-lister-1  | [2022-09-13 14:51:14,660: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'updated_at': '2022-09-11 12:08:17.012383', 'versions': {'0.0.1-dev.0': OrderedDict([('checksum', '878b6701a5ab722ef3c30f2af1a25539c50d83c97da98941998c684b0f5c52cd'), ('crate_id', '653834'), ('crate_size', '2821'), ('created_at', '2022-08-28 11:43:09.333693'), ('downloads', '22'), ('features', '{}'), ('id', '610177'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.0'), ('published_by', '163342'), ('updated_at', '2022-08-28 11:43:09.333693'), ('yanked', 'f')]), '0.0.1': OrderedDict([('checksum', '1cb7dccc5e4128b4ebe8c46ca29e440e52bbca4daad5dcea864a74f25dcee0ce'), ('crate_id', '653834'), ('crate_size', '7999'), ('created_at', '2022-09-11 12:08:17.012383'), ('downloads', '12'), ('features', '{}'), ('id', '618675'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1'), ('published_by', '163342'), ('updated_at', '2022-09-11 12:08:17.012383'), ('yanked', 'f')]), '0.0.1-dev.2': OrderedDict([('checksum', '215f42225dffe2a135d1480662d379620445628b4bfe17aee56a20cf0d4590ce'), ('crate_id', '653834'), ('crate_size', '7547'), ('created_at', '2022-08-31 16:02:07.17775'), ('downloads', '23'), ('features', '{}'), ('id', '611995'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.2'), ('published_by', '163342'), ('updated_at', '2022-08-31 16:02:07.17775'), ('yanked', 'f')]), '0.0.1-dev.1': OrderedDict([('checksum', '2d5fb208766898bb8dcf9c2e270143b5b71b6271698d45ac86dc4d3e97ef178e'), ('crate_id', '653834'), ('crate_size', '2940'), ('created_at', '2022-08-29 15:32:43.066959'), ('downloads', '23'), ('features', '{}'), ('id', '610858'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.1'), ('published_by', '163342'), ('updated_at', '2022-08-29 15:32:43.066959'), ('yanked', 'f')])}}
docker-swh-lister-1  | [2022-09-13 14:51:14,661: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'vers': '0.0.1-dev.2', 'deps': [{'name': 'getrandom', 'req': '^0.2', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'normal'}, {'name': 'criterion', 'req': '^0.3', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}, {'name': 'nanoid', 'req': '^0.4.0', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}, {'name': 'uuid', 'req': '^1.1.2', 'features': ['v4', 'rng'], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}], 'cksum': '215f42225dffe2a135d1480662d379620445628b4bfe17aee56a20cf0d4590ce', 'features': {}, 'yanked': False, 'links': None}
docker-swh-lister-1  | [2022-09-13 14:51:14,661: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'updated_at': '2022-09-11 12:08:17.012383', 'versions': {'0.0.1-dev.0': OrderedDict([('checksum', '878b6701a5ab722ef3c30f2af1a25539c50d83c97da98941998c684b0f5c52cd'), ('crate_id', '653834'), ('crate_size', '2821'), ('created_at', '2022-08-28 11:43:09.333693'), ('downloads', '22'), ('features', '{}'), ('id', '610177'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.0'), ('published_by', '163342'), ('updated_at', '2022-08-28 11:43:09.333693'), ('yanked', 'f')]), '0.0.1': OrderedDict([('checksum', '1cb7dccc5e4128b4ebe8c46ca29e440e52bbca4daad5dcea864a74f25dcee0ce'), ('crate_id', '653834'), ('crate_size', '7999'), ('created_at', '2022-09-11 12:08:17.012383'), ('downloads', '12'), ('features', '{}'), ('id', '618675'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1'), ('published_by', '163342'), ('updated_at', '2022-09-11 12:08:17.012383'), ('yanked', 'f')]), '0.0.1-dev.2': OrderedDict([('checksum', '215f42225dffe2a135d1480662d379620445628b4bfe17aee56a20cf0d4590ce'), ('crate_id', '653834'), ('crate_size', '7547'), ('created_at', '2022-08-31 16:02:07.17775'), ('downloads', '23'), ('features', '{}'), ('id', '611995'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.2'), ('published_by', '163342'), ('updated_at', '2022-08-31 16:02:07.17775'), ('yanked', 'f')]), '0.0.1-dev.1': OrderedDict([('checksum', '2d5fb208766898bb8dcf9c2e270143b5b71b6271698d45ac86dc4d3e97ef178e'), ('crate_id', '653834'), ('crate_size', '2940'), ('created_at', '2022-08-29 15:32:43.066959'), ('downloads', '23'), ('features', '{}'), ('id', '610858'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.1'), ('published_by', '163342'), ('updated_at', '2022-08-29 15:32:43.066959'), ('yanked', 'f')])}}
docker-swh-lister-1  | [2022-09-13 14:51:14,661: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'vers': '0.0.1', 'deps': [{'name': 'getrandom', 'req': '^0.2', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'normal'}, {'name': 'criterion', 'req': '^0.4.0', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}, {'name': 'nanoid', 'req': '^0.4.0', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}, {'name': 'uuid', 'req': '^1.1.2', 'features': ['v4', 'rng'], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}], 'cksum': '1cb7dccc5e4128b4ebe8c46ca29e440e52bbca4daad5dcea864a74f25dcee0ce', 'features': {}, 'yanked': False, 'links': None}
docker-swh-lister-1  | [2022-09-13 14:51:14,661: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'updated_at': '2022-09-11 12:08:17.012383', 'versions': {'0.0.1-dev.0': OrderedDict([('checksum', '878b6701a5ab722ef3c30f2af1a25539c50d83c97da98941998c684b0f5c52cd'), ('crate_id', '653834'), ('crate_size', '2821'), ('created_at', '2022-08-28 11:43:09.333693'), ('downloads', '22'), ('features', '{}'), ('id', '610177'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.0'), ('published_by', '163342'), ('updated_at', '2022-08-28 11:43:09.333693'), ('yanked', 'f')]), '0.0.1': OrderedDict([('checksum', '1cb7dccc5e4128b4ebe8c46ca29e440e52bbca4daad5dcea864a74f25dcee0ce'), ('crate_id', '653834'), ('crate_size', '7999'), ('created_at', '2022-09-11 12:08:17.012383'), ('downloads', '12'), ('features', '{}'), ('id', '618675'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1'), ('published_by', '163342'), ('updated_at', '2022-09-11 12:08:17.012383'), ('yanked', 'f')]), '0.0.1-dev.2': OrderedDict([('checksum', '215f42225dffe2a135d1480662d379620445628b4bfe17aee56a20cf0d4590ce'), ('crate_id', '653834'), ('crate_size', '7547'), ('created_at', '2022-08-31 16:02:07.17775'), ('downloads', '23'), ('features', '{}'), ('id', '611995'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.2'), ('published_by', '163342'), ('updated_at', '2022-08-31 16:02:07.17775'), ('yanked', 'f')]), '0.0.1-dev.1': OrderedDict([('checksum', '2d5fb208766898bb8dcf9c2e270143b5b71b6271698d45ac86dc4d3e97ef178e'), ('crate_id', '653834'), ('crate_size', '2940'), ('created_at', '2022-08-29 15:32:43.066959'), ('downloads', '23'), ('features', '{}'), ('id', '610858'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.1'), ('published_by', '163342'), ('updated_at', '2022-08-29 15:32:43.066959'), ('yanked', 'f')])}}
docker-swh-lister-1  | [2022-09-13 14:51:14,661: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'vers': '0.0.2', 'deps': [{'name': 'getrandom', 'req': '^0.2', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'normal'}, {'name': 'criterion', 'req': '^0.4.0', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}, {'name': 'nanoid', 'req': '^0.4.0', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}, {'name': 'uuid', 'req': '^1.1.2', 'features': ['v4', 'rng'], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}], 'cksum': 'c25097f191e32ad6550e402f6c5e6fbae7115a60bfedea2a4f5351c16a286229', 'features': {}, 'yanked': False, 'links': None}
docker-swh-lister-1  | [2022-09-13 14:51:14,661: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'updated_at': '2022-09-11 12:08:17.012383', 'versions': {'0.0.1-dev.0': OrderedDict([('checksum', '878b6701a5ab722ef3c30f2af1a25539c50d83c97da98941998c684b0f5c52cd'), ('crate_id', '653834'), ('crate_size', '2821'), ('created_at', '2022-08-28 11:43:09.333693'), ('downloads', '22'), ('features', '{}'), ('id', '610177'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.0'), ('published_by', '163342'), ('updated_at', '2022-08-28 11:43:09.333693'), ('yanked', 'f')]), '0.0.1': OrderedDict([('checksum', '1cb7dccc5e4128b4ebe8c46ca29e440e52bbca4daad5dcea864a74f25dcee0ce'), ('crate_id', '653834'), ('crate_size', '7999'), ('created_at', '2022-09-11 12:08:17.012383'), ('downloads', '12'), ('features', '{}'), ('id', '618675'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1'), ('published_by', '163342'), ('updated_at', '2022-09-11 12:08:17.012383'), ('yanked', 'f')]), '0.0.1-dev.2': OrderedDict([('checksum', '215f42225dffe2a135d1480662d379620445628b4bfe17aee56a20cf0d4590ce'), ('crate_id', '653834'), ('crate_size', '7547'), ('created_at', '2022-08-31 16:02:07.17775'), ('downloads', '23'), ('features', '{}'), ('id', '611995'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.2'), ('published_by', '163342'), ('updated_at', '2022-08-31 16:02:07.17775'), ('yanked', 'f')]), '0.0.1-dev.1': OrderedDict([('checksum', '2d5fb208766898bb8dcf9c2e270143b5b71b6271698d45ac86dc4d3e97ef178e'), ('crate_id', '653834'), ('crate_size', '2940'), ('created_at', '2022-08-29 15:32:43.066959'), ('downloads', '23'), ('features', '{}'), ('id', '610858'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.1'), ('published_by', '163342'), ('updated_at', '2022-08-29 15:32:43.066959'), ('yanked', 'f')])}}
docker-swh-lister-1  | [2022-09-13 14:51:14,662: DEBUG/ForkPoolWorker-1] Listing crates origin completed with last commit id 81cd3beb5d62f3b898607ab5b266a856b0e9fab8
docker-swh-lister-1  | [2022-09-13 14:51:17,965: DEBUG/ForkPoolWorker-1] Successfully removed /tmp/crates.io-index directory
docker-swh-lister-1  | [2022-09-13 14:51:18,058: DEBUG/ForkPoolWorker-1] Successfully removed /tmp/crates.io-db_dump directory
docker-swh-lister-1  | [2022-09-13 14:51:18,066: ERROR/ForkPoolWorker-1] Task swh.lister.crates.tasks.CratesListerTask[d88bb21b-1613-4230-b4ec-5bdd5092982c] raised unexpected: KeyError('0.0.2')
docker-swh-lister-1  | Traceback (most recent call last):
docker-swh-lister-1  |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 451, in trace_task
docker-swh-lister-1  |     R = retval = fun(*args, **kwargs)
docker-swh-lister-1  |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/scheduler/task.py", line 61, in __call__
docker-swh-lister-1  |     result = super().__call__(*args, **kwargs)
docker-swh-lister-1  |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 734, in __protected_call__
docker-swh-lister-1  |     return self.run(*args, **kwargs)
docker-swh-lister-1  |   File "/src/swh-lister/swh/lister/crates/tasks.py", line 14, in list_crates
docker-swh-lister-1  |     return CratesLister.from_configfile(**lister_args).run().dict()
docker-swh-lister-1  |   File "/src/swh-lister/swh/lister/pattern.py", line 127, in run
docker-swh-lister-1  |     for page in self.get_pages():
docker-swh-lister-1  |   File "/src/swh-lister/swh/lister/crates/lister.py", line 245, in get_pages
docker-swh-lister-1  |     entry["version"]
docker-swh-lister-1  | KeyError: '0.0.2'

Working only with the CSV files should guarantee crates data are consistent.

Tue, Sep 13, 5:13 PM
franckbret updated the diff for D8455: Pubdev: Ensure we correctly parse and sort release versions.

Add packaging to requirements + some cleanup

Tue, Sep 13, 5:07 PM
franckbret added a comment to D8453: user/loaders: Add new loaders in development to loaders info page.

Looks good to me thanks

Tue, Sep 13, 3:15 PM
franckbret requested review of D8455: Pubdev: Ensure we correctly parse and sort release versions.
Tue, Sep 13, 10:07 AM
franckbret updated subscribers of D8454: Crates.io: Add last_update for each version of a crate.

@vlorentz @ardumont Here is a new version of the crates.io lister which loads csv fies from crates.io database dump.

Tue, Sep 13, 8:46 AM
franckbret requested review of D8454: Crates.io: Add last_update for each version of a crate.
Tue, Sep 13, 8:34 AM

Thu, Sep 8

franckbret added a comment to D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

@franckbret fyi you have updated the wrong diff (pubdev instead of haskell)

Thu, Sep 8, 10:16 AM
franckbret updated the diff for D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

Rebase

Thu, Sep 8, 10:12 AM
franckbret added a comment to D8407: docker/conf: Declare celery tasks for listers/loaders in development.

Thanks.

fwiw, i think this should be added with diffs by the contributors of those new listers/loaders.

@bchauvet ^ I expected this to be done already.

@franckbret for your information, i must have forgotten to be explicit about this before so ;)

Thu, Sep 8, 7:29 AM

Wed, Sep 7

franckbret updated the diff for D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

Rebase

Wed, Sep 7, 3:28 PM
franckbret updated the diff for D8379: Hackage: Loads Hackage Listed origins.

Rebase

Wed, Sep 7, 3:28 PM

Tue, Sep 6

franckbret added a comment to D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

You didn't add the test I asked for in D8338#inline-59471

Tue, Sep 6, 12:45 PM
franckbret updated the diff for D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

Add pagination tests

Tue, Sep 6, 9:40 AM

Mon, Sep 5

franckbret added a comment to D8379: Hackage: Loads Hackage Listed origins.

@vlorentz @ardumont Can we merge this one?

Mon, Sep 5, 5:16 PM
franckbret added a comment to D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

@ardumont @vlorentz If you don't have other comments or suggestions I think we can merge this one

Mon, Sep 5, 5:15 PM
franckbret added a comment to D8379: Hackage: Loads Hackage Listed origins.

I've run the loader tasks for a few hours. no errors, some not_found.

Mon, Sep 5, 2:15 PM
franckbret added a comment to D8354: pubdev: Modify origin URL and retrieve package last update.

Hi, looks good for me too. Will test in Docker once its merged.

Mon, Sep 5, 11:05 AM
franckbret updated the task description for T4494: ingest hackage.haskell.org, the Haskell package repository .
Mon, Sep 5, 10:00 AM · Hackage loader, Hackage lister, Archive coverage
franckbret created T4494: ingest hackage.haskell.org, the Haskell package repository .
Mon, Sep 5, 9:55 AM · Hackage loader, Hackage lister, Archive coverage
franckbret updated the task description for T4466: Ingest Aur repository (Arch User Repository).
Mon, Sep 5, 9:50 AM · AUR loader, AUR lister, Archive coverage
franckbret updated the task description for T4466: Ingest Aur repository (Arch User Repository).
Mon, Sep 5, 9:50 AM · AUR loader, AUR lister, Archive coverage

Fri, Sep 2

franckbret updated the diff for D8379: Hackage: Loads Hackage Listed origins.

Fill release date

Fri, Sep 2, 5:01 PM
franckbret updated the diff for D8379: Hackage: Loads Hackage Listed origins.

Cabal keys may be Capitalized

Fri, Sep 2, 12:40 PM
franckbret updated the diff for D8379: Hackage: Loads Hackage Listed origins.

Ensure we make a json get request adding correct headers

Fri, Sep 2, 11:56 AM
franckbret added a comment to D8379: Hackage: Loads Hackage Listed origins.

Found a way:

$ curl https://hackage.haskell.org/package/colors-0.1/revisions/ -H "Accept: application/json"   
[{"number":0,"time":"2013-06-01T13:59:19Z","user":"FumiakiKinoshita"}]
Fri, Sep 2, 11:51 AM
franckbret added a comment to D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

Testing Docker with that last commit

Fri, Sep 2, 11:13 AM
franckbret added inline comments to D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.
Fri, Sep 2, 11:12 AM
franckbret added a comment to D8171: crates: Loader implements incremental mode.

From chatroom:
ardumont
val: ^ currently listers and loaders are not really separated (only the github lister is separated from the rest to avoid starvation around listing forges)
If we want accurate date for both the lister and the loader:
yes, we do so i guess only the option 2 is the way forward

Fri, Sep 2, 10:54 AM
franckbret updated the diff for D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

Better implementation of pagination

Fri, Sep 2, 10:45 AM
franckbret updated subscribers of D8379: Hackage: Loads Hackage Listed origins.

@ardumont @vlorentz Did not found a way to get a release date per version with this one.

Fri, Sep 2, 9:43 AM
franckbret requested review of D8379: Hackage: Loads Hackage Listed origins.
Fri, Sep 2, 9:39 AM

Thu, Sep 1

franckbret added a comment to D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

Change http api endpoint for search in order to retrieve a last_update

Switch from GET to POST to get results.
Lister is not a single page anymore, each page list 50 origins.

Thu, Sep 1, 6:37 PM
franckbret updated the diff for D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

Change http api endpoint for search in order to retrieve a last_update

Thu, Sep 1, 5:50 PM

Wed, Aug 31

franckbret added a comment to D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

There is an API that provides access to the lastUpload:

$ curl "https://hackage.haskell.org/packages/search" -H "Accept: application/json" -H "Content-Type: application/json" --data '{"page": 0, "sortColumn": "default", "sortDirection": "ascending", "searchQuery": "(deprecated:any)"}' -X POST | jq . | head -n 50 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 23907    0 23806  100   101  40145    170 --:--:-- --:--:-- --:--:-- 40315
{
  "numberOfResults": 16711,
  "pageContents": [
    {
      "description": "Haskell package for easy integration with the 2captcha API.",
      "downloads": 1,
      "lastUpload": "2021-09-09T05:13:30.343509948Z",
      "maintainers": [
        {
          "display": "qwbarch",
          "uri": "/user/qwbarch"
        }
      ],
      "name": {
        "display": "2captcha",
        "uri": "/package/2captcha"
      },
      "tags": [
        {
          "display": "deprecated",
          "uri": "/packages/tag/deprecated"
        },
        {
          "display": "library",
          "uri": "/packages/tag/library"
        },
        {
          "display": "mit",
          "uri": "/packages/tag/mit"
        },
        {
          "display": "network",
          "uri": "/packages/tag/network"
        }
      ],
      "votes": 1.5
    },
    {
      "description": "Examples of 3D graphics programming with OpenGL",
      "downloads": 8,
      "lastUpload": "2016-07-22T14:26:23.038905Z",
      "maintainers": [
        {
          "display": "WolfgangJeltsch",
          "uri": "/user/WolfgangJeltsch"
        }
      ],
      "name": {
        "display": "3d-graphics-examples",
        "uri": "/package/3d-graphics-examples"

You can also use the same API for incremental listing by filtering on lastUpload in the search query.

Thanks, I missed this one

Wed, Aug 31, 5:28 PM
franckbret added a comment to D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

There is an API that provides access to the lastUpload:

$ curl "https://hackage.haskell.org/packages/search" -H "Accept: application/json" -H "Content-Type: application/json" --data '{"page": 0, "sortColumn": "default", "sortDirection": "ascending", "searchQuery": "(deprecated:any)"}' -X POST | jq . | head -n 50 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 23907    0 23806  100   101  40145    170 --:--:-- --:--:-- --:--:-- 40315
{
  "numberOfResults": 16711,
  "pageContents": [
    {
      "description": "Haskell package for easy integration with the 2captcha API.",
      "downloads": 1,
      "lastUpload": "2021-09-09T05:13:30.343509948Z",
      "maintainers": [
        {
          "display": "qwbarch",
          "uri": "/user/qwbarch"
        }
      ],
      "name": {
        "display": "2captcha",
        "uri": "/package/2captcha"
      },
      "tags": [
        {
          "display": "deprecated",
          "uri": "/packages/tag/deprecated"
        },
        {
          "display": "library",
          "uri": "/packages/tag/library"
        },
        {
          "display": "mit",
          "uri": "/packages/tag/mit"
        },
        {
          "display": "network",
          "uri": "/packages/tag/network"
        }
      ],
      "votes": 1.5
    },
    {
      "description": "Examples of 3D graphics programming with OpenGL",
      "downloads": 8,
      "lastUpload": "2016-07-22T14:26:23.038905Z",
      "maintainers": [
        {
          "display": "WolfgangJeltsch",
          "uri": "/user/WolfgangJeltsch"
        }
      ],
      "name": {
        "display": "3d-graphics-examples",
        "uri": "/package/3d-graphics-examples"

You can also use the same API for incremental listing by filtering on lastUpload in the search query.

Wed, Aug 31, 10:44 AM

Tue, Aug 30

franckbret updated subscribers of D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

@ardumont @vlorentz This one is quite simple but like pubdev we do not have access to coherent data to set a last_update. See https://hackage.haskell.org/packages/

Tue, Aug 30, 4:26 PM
franckbret retitled D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository from [WIP] Hackage: List origins from hackage.haskell.org, The Haskell Package Repository to Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.
Tue, Aug 30, 4:04 PM
franckbret updated the diff for D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

Remove forgotten and now useless test file, remoove WIP prefix

Tue, Aug 30, 4:03 PM
franckbret closed D8339: Arch: Add throttling retry for scrapping and resources download.
Tue, Aug 30, 1:52 PM
franckbret committed rDLS0acf5b0f4f50: Arch: Add throttling retry for scrapping and resources download (authored by franckbret).
Arch: Add throttling retry for scrapping and resources download
Tue, Aug 30, 1:52 PM
franckbret added a comment to T4466: Ingest Aur repository (Arch User Repository).

I've made a complete run on docker

Tue, Aug 30, 1:50 PM · AUR loader, AUR lister, Archive coverage
franckbret updated subscribers of D8339: Arch: Add throttling retry for scrapping and resources download.

@vlorentz @ardumont Can we merge this one. I've tested it in Docker environment and don't get failure anymore. See https://forge.softwareheritage.org/T4233#90006

Tue, Aug 30, 1:12 PM
franckbret added a comment to T4233: Ingest Arch Linux.

Arch Linux Lister Docker Report

The lister takes a lot of time and fail on max retries when scraping repository directory (It has run fine a few weeks ago.). Not sure at this point, but I suspect that's a random problem related to network / http server. WIll run it multiple time to see if it failed on the same resource.

By the way I guess that the we need to define a strategy for those exceptions.

Tue, Aug 30, 10:28 AM · Arch loader, Arch Lister, Archive coverage
franckbret updated the task description for T4233: Ingest Arch Linux.
Tue, Aug 30, 10:14 AM · Arch loader, Arch Lister, Archive coverage
franckbret updated the task description for T4233: Ingest Arch Linux.
Tue, Aug 30, 10:06 AM · Arch loader, Arch Lister, Archive coverage