Page MenuHomeSoftware Heritage
Feed Advanced Search

Sep 26 2022

franckbret retitled D8529: RubyGems: List origins from https://rubygems.org from [WIP] RubyGems: List origins from https://rubygems.org to RubyGems: List origins from https://rubygems.org.
Sep 26 2022, 12:10 PM
franckbret updated the diff for D8529: RubyGems: List origins from https://rubygems.org.

Make use of http_retry instead of throttling_retry decorator after D8519

Sep 26 2022, 12:10 PM
franckbret requested review of D8531: Puppet: Lister for Puppet modules.
Sep 26 2022, 12:06 PM

Sep 23 2022

franckbret requested review of D8529: RubyGems: List origins from https://rubygems.org.
Sep 23 2022, 8:10 PM
franckbret requested review of D8528: Nuget: Lister for NuGet the package manager for .NET.
Sep 23 2022, 5:56 PM
franckbret added a comment to D8517: Conda: List origins from anaconda.com, the Package, dependency and environment management for any language.

Some test report after running the lister (lister only, no loader yet) with following command:

Sep 23 2022, 9:53 AM
franckbret updated the diff for D8517: Conda: List origins from anaconda.com, the Package, dependency and environment management for any language.

Doc: Add missing quote to docker command example to add a conda task

Sep 23 2022, 9:40 AM
franckbret updated the diff for D8517: Conda: List origins from anaconda.com, the Package, dependency and environment management for any language.

Add missing documentation link

Sep 23 2022, 9:26 AM
franckbret updated the diff for D8517: Conda: List origins from anaconda.com, the Package, dependency and environment management for any language.

Add documentation for lister usage

Sep 23 2022, 9:11 AM

Sep 22 2022

franckbret retitled D8517: Conda: List origins from anaconda.com, the Package, dependency and environment management for any language from [WIP] Conda: List origins from anaconda.com, the Package, dependency and environment management for any language to Conda: List origins from anaconda.com, the Package, dependency and environment management for any language.
Sep 22 2022, 7:51 PM
franckbret updated the diff for D8517: Conda: List origins from anaconda.com, the Package, dependency and environment management for any language.

Lister improvment

Sep 22 2022, 7:44 PM

Sep 21 2022

franckbret requested review of D8517: Conda: List origins from anaconda.com, the Package, dependency and environment management for any language.
Sep 21 2022, 2:59 PM

Sep 19 2022

franckbret updated the diff for D8171: crates: Loader implements incremental mode.

Update the patch to make it work with new lister patch that give a last_update value for each versions

Sep 19 2022, 10:51 AM

Sep 16 2022

franckbret updated the diff for D8454: Crates.io: Add last_update for each version of a crate.

Update documentation

Sep 16 2022, 8:41 AM

Sep 15 2022

franckbret added a comment to D8454: Crates.io: Add last_update for each version of a crate.

Also I think it can be doable to totally remove the GIT part of the lister. The csv files have everything we need. For the incremental part a metadata.json file at the root of the archive a date and a commit hash that represents the date of the db dump.
In incremental case the lister can compare that date to the last update of the crate.

I also think getting rid of the git part would be a good idea.

By testing that diff in docker, I quickly got an error as the git repository contains more recent crate versions
as those extracted from the db dump, see below (ftr, I added some debug logs):

Working only with the CSV files should guarantee crates data are consistent.
Sep 15 2022, 6:53 PM
franckbret added a comment to D8454: Crates.io: Add last_update for each version of a crate.

Didn't test this one yet on docker but guessed this situation. The backup is generated everyday or so, the git repo changes everyday.

Do they document at what time of day it is generated? Would be nice to run the lister right after to minimize lag

Sep 15 2022, 6:49 PM
franckbret added inline comments to D8454: Crates.io: Add last_update for each version of a crate.
Sep 15 2022, 6:43 PM
franckbret updated the diff for D8454: Crates.io: Add last_update for each version of a crate.

Use csv listing only

Sep 15 2022, 6:20 PM

Sep 14 2022

franckbret closed D8455: Pubdev: Ensure we correctly parse and sort release versions.
Sep 14 2022, 6:23 AM
franckbret committed rDLDBASE134087342b42: Pubdev: Ensure we correctly parse and sort release versions (authored by franckbret).
Pubdev: Ensure we correctly parse and sort release versions
Sep 14 2022, 6:23 AM

Sep 13 2022

franckbret added a comment to D8454: Crates.io: Add last_update for each version of a crate.

Also I think it can be doable to totally remove the GIT part of the lister. The csv files have everything we need. For the incremental part a metadata.json file at the root of the archive a date and a commit hash that represents the date of the db dump.
In incremental case the lister can compare that date to the last update of the crate.

I also think getting rid of the git part would be a good idea.

By testing that diff in docker, I quickly got an error as the git repository contains more recent crate versions
as those extracted from the db dump, see below (ftr, I added some debug logs):

docker-swh-lister-1  | [2022-09-13 14:48:59,849: INFO/MainProcess] Task swh.lister.crates.tasks.CratesListerTask[d88bb21b-1613-4230-b4ec-5bdd5092982c] received
docker-swh-lister-1  | [2022-09-13 14:48:59,851: DEBUG/ForkPoolWorker-1] Loading config file /lister.yml
docker-swh-lister-1  | Enumerating objects: 158660, done.
Counting objects: 100% (1216/1216), done.  0% (1/1216)
Compressing objects: 100% (601/601), done.:   0% (1/601)
docker-swh-lister-1  | Total 158660 (delta 715), reused 1090 (delta 589), pack-reused 157444
docker-swh-lister-1  | [2022-09-13 14:51:14,659: DEBUG/ForkPoolWorker-1] Found 25 crates in crates_index
docker-swh-lister-1  | [2022-09-13 14:51:14,660: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'vers': '0.0.1-dev.0', 'deps': [{'name': 'hex', 'req': '^0.4.3', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'normal'}, {'name': 'rand', 'req': '^0.8.5', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'normal'}], 'cksum': '878b6701a5ab722ef3c30f2af1a25539c50d83c97da98941998c684b0f5c52cd', 'features': {}, 'yanked': False, 'links': None}
docker-swh-lister-1  | [2022-09-13 14:51:14,660: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'updated_at': '2022-09-11 12:08:17.012383', 'versions': {'0.0.1-dev.0': OrderedDict([('checksum', '878b6701a5ab722ef3c30f2af1a25539c50d83c97da98941998c684b0f5c52cd'), ('crate_id', '653834'), ('crate_size', '2821'), ('created_at', '2022-08-28 11:43:09.333693'), ('downloads', '22'), ('features', '{}'), ('id', '610177'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.0'), ('published_by', '163342'), ('updated_at', '2022-08-28 11:43:09.333693'), ('yanked', 'f')]), '0.0.1': OrderedDict([('checksum', '1cb7dccc5e4128b4ebe8c46ca29e440e52bbca4daad5dcea864a74f25dcee0ce'), ('crate_id', '653834'), ('crate_size', '7999'), ('created_at', '2022-09-11 12:08:17.012383'), ('downloads', '12'), ('features', '{}'), ('id', '618675'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1'), ('published_by', '163342'), ('updated_at', '2022-09-11 12:08:17.012383'), ('yanked', 'f')]), '0.0.1-dev.2': OrderedDict([('checksum', '215f42225dffe2a135d1480662d379620445628b4bfe17aee56a20cf0d4590ce'), ('crate_id', '653834'), ('crate_size', '7547'), ('created_at', '2022-08-31 16:02:07.17775'), ('downloads', '23'), ('features', '{}'), ('id', '611995'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.2'), ('published_by', '163342'), ('updated_at', '2022-08-31 16:02:07.17775'), ('yanked', 'f')]), '0.0.1-dev.1': OrderedDict([('checksum', '2d5fb208766898bb8dcf9c2e270143b5b71b6271698d45ac86dc4d3e97ef178e'), ('crate_id', '653834'), ('crate_size', '2940'), ('created_at', '2022-08-29 15:32:43.066959'), ('downloads', '23'), ('features', '{}'), ('id', '610858'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.1'), ('published_by', '163342'), ('updated_at', '2022-08-29 15:32:43.066959'), ('yanked', 'f')])}}
docker-swh-lister-1  | [2022-09-13 14:51:14,660: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'vers': '0.0.1-dev.1', 'deps': [{'name': 'hex', 'req': '^0.4.3', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'normal'}, {'name': 'rand', 'req': '^0.8.5', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'normal'}], 'cksum': '2d5fb208766898bb8dcf9c2e270143b5b71b6271698d45ac86dc4d3e97ef178e', 'features': {}, 'yanked': False, 'links': None}
docker-swh-lister-1  | [2022-09-13 14:51:14,660: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'updated_at': '2022-09-11 12:08:17.012383', 'versions': {'0.0.1-dev.0': OrderedDict([('checksum', '878b6701a5ab722ef3c30f2af1a25539c50d83c97da98941998c684b0f5c52cd'), ('crate_id', '653834'), ('crate_size', '2821'), ('created_at', '2022-08-28 11:43:09.333693'), ('downloads', '22'), ('features', '{}'), ('id', '610177'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.0'), ('published_by', '163342'), ('updated_at', '2022-08-28 11:43:09.333693'), ('yanked', 'f')]), '0.0.1': OrderedDict([('checksum', '1cb7dccc5e4128b4ebe8c46ca29e440e52bbca4daad5dcea864a74f25dcee0ce'), ('crate_id', '653834'), ('crate_size', '7999'), ('created_at', '2022-09-11 12:08:17.012383'), ('downloads', '12'), ('features', '{}'), ('id', '618675'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1'), ('published_by', '163342'), ('updated_at', '2022-09-11 12:08:17.012383'), ('yanked', 'f')]), '0.0.1-dev.2': OrderedDict([('checksum', '215f42225dffe2a135d1480662d379620445628b4bfe17aee56a20cf0d4590ce'), ('crate_id', '653834'), ('crate_size', '7547'), ('created_at', '2022-08-31 16:02:07.17775'), ('downloads', '23'), ('features', '{}'), ('id', '611995'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.2'), ('published_by', '163342'), ('updated_at', '2022-08-31 16:02:07.17775'), ('yanked', 'f')]), '0.0.1-dev.1': OrderedDict([('checksum', '2d5fb208766898bb8dcf9c2e270143b5b71b6271698d45ac86dc4d3e97ef178e'), ('crate_id', '653834'), ('crate_size', '2940'), ('created_at', '2022-08-29 15:32:43.066959'), ('downloads', '23'), ('features', '{}'), ('id', '610858'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.1'), ('published_by', '163342'), ('updated_at', '2022-08-29 15:32:43.066959'), ('yanked', 'f')])}}
docker-swh-lister-1  | [2022-09-13 14:51:14,661: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'vers': '0.0.1-dev.2', 'deps': [{'name': 'getrandom', 'req': '^0.2', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'normal'}, {'name': 'criterion', 'req': '^0.3', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}, {'name': 'nanoid', 'req': '^0.4.0', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}, {'name': 'uuid', 'req': '^1.1.2', 'features': ['v4', 'rng'], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}], 'cksum': '215f42225dffe2a135d1480662d379620445628b4bfe17aee56a20cf0d4590ce', 'features': {}, 'yanked': False, 'links': None}
docker-swh-lister-1  | [2022-09-13 14:51:14,661: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'updated_at': '2022-09-11 12:08:17.012383', 'versions': {'0.0.1-dev.0': OrderedDict([('checksum', '878b6701a5ab722ef3c30f2af1a25539c50d83c97da98941998c684b0f5c52cd'), ('crate_id', '653834'), ('crate_size', '2821'), ('created_at', '2022-08-28 11:43:09.333693'), ('downloads', '22'), ('features', '{}'), ('id', '610177'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.0'), ('published_by', '163342'), ('updated_at', '2022-08-28 11:43:09.333693'), ('yanked', 'f')]), '0.0.1': OrderedDict([('checksum', '1cb7dccc5e4128b4ebe8c46ca29e440e52bbca4daad5dcea864a74f25dcee0ce'), ('crate_id', '653834'), ('crate_size', '7999'), ('created_at', '2022-09-11 12:08:17.012383'), ('downloads', '12'), ('features', '{}'), ('id', '618675'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1'), ('published_by', '163342'), ('updated_at', '2022-09-11 12:08:17.012383'), ('yanked', 'f')]), '0.0.1-dev.2': OrderedDict([('checksum', '215f42225dffe2a135d1480662d379620445628b4bfe17aee56a20cf0d4590ce'), ('crate_id', '653834'), ('crate_size', '7547'), ('created_at', '2022-08-31 16:02:07.17775'), ('downloads', '23'), ('features', '{}'), ('id', '611995'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.2'), ('published_by', '163342'), ('updated_at', '2022-08-31 16:02:07.17775'), ('yanked', 'f')]), '0.0.1-dev.1': OrderedDict([('checksum', '2d5fb208766898bb8dcf9c2e270143b5b71b6271698d45ac86dc4d3e97ef178e'), ('crate_id', '653834'), ('crate_size', '2940'), ('created_at', '2022-08-29 15:32:43.066959'), ('downloads', '23'), ('features', '{}'), ('id', '610858'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.1'), ('published_by', '163342'), ('updated_at', '2022-08-29 15:32:43.066959'), ('yanked', 'f')])}}
docker-swh-lister-1  | [2022-09-13 14:51:14,661: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'vers': '0.0.1', 'deps': [{'name': 'getrandom', 'req': '^0.2', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'normal'}, {'name': 'criterion', 'req': '^0.4.0', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}, {'name': 'nanoid', 'req': '^0.4.0', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}, {'name': 'uuid', 'req': '^1.1.2', 'features': ['v4', 'rng'], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}], 'cksum': '1cb7dccc5e4128b4ebe8c46ca29e440e52bbca4daad5dcea864a74f25dcee0ce', 'features': {}, 'yanked': False, 'links': None}
docker-swh-lister-1  | [2022-09-13 14:51:14,661: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'updated_at': '2022-09-11 12:08:17.012383', 'versions': {'0.0.1-dev.0': OrderedDict([('checksum', '878b6701a5ab722ef3c30f2af1a25539c50d83c97da98941998c684b0f5c52cd'), ('crate_id', '653834'), ('crate_size', '2821'), ('created_at', '2022-08-28 11:43:09.333693'), ('downloads', '22'), ('features', '{}'), ('id', '610177'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.0'), ('published_by', '163342'), ('updated_at', '2022-08-28 11:43:09.333693'), ('yanked', 'f')]), '0.0.1': OrderedDict([('checksum', '1cb7dccc5e4128b4ebe8c46ca29e440e52bbca4daad5dcea864a74f25dcee0ce'), ('crate_id', '653834'), ('crate_size', '7999'), ('created_at', '2022-09-11 12:08:17.012383'), ('downloads', '12'), ('features', '{}'), ('id', '618675'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1'), ('published_by', '163342'), ('updated_at', '2022-09-11 12:08:17.012383'), ('yanked', 'f')]), '0.0.1-dev.2': OrderedDict([('checksum', '215f42225dffe2a135d1480662d379620445628b4bfe17aee56a20cf0d4590ce'), ('crate_id', '653834'), ('crate_size', '7547'), ('created_at', '2022-08-31 16:02:07.17775'), ('downloads', '23'), ('features', '{}'), ('id', '611995'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.2'), ('published_by', '163342'), ('updated_at', '2022-08-31 16:02:07.17775'), ('yanked', 'f')]), '0.0.1-dev.1': OrderedDict([('checksum', '2d5fb208766898bb8dcf9c2e270143b5b71b6271698d45ac86dc4d3e97ef178e'), ('crate_id', '653834'), ('crate_size', '2940'), ('created_at', '2022-08-29 15:32:43.066959'), ('downloads', '23'), ('features', '{}'), ('id', '610858'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.1'), ('published_by', '163342'), ('updated_at', '2022-08-29 15:32:43.066959'), ('yanked', 'f')])}}
docker-swh-lister-1  | [2022-09-13 14:51:14,661: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'vers': '0.0.2', 'deps': [{'name': 'getrandom', 'req': '^0.2', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'normal'}, {'name': 'criterion', 'req': '^0.4.0', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}, {'name': 'nanoid', 'req': '^0.4.0', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}, {'name': 'uuid', 'req': '^1.1.2', 'features': ['v4', 'rng'], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}], 'cksum': 'c25097f191e32ad6550e402f6c5e6fbae7115a60bfedea2a4f5351c16a286229', 'features': {}, 'yanked': False, 'links': None}
docker-swh-lister-1  | [2022-09-13 14:51:14,661: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'updated_at': '2022-09-11 12:08:17.012383', 'versions': {'0.0.1-dev.0': OrderedDict([('checksum', '878b6701a5ab722ef3c30f2af1a25539c50d83c97da98941998c684b0f5c52cd'), ('crate_id', '653834'), ('crate_size', '2821'), ('created_at', '2022-08-28 11:43:09.333693'), ('downloads', '22'), ('features', '{}'), ('id', '610177'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.0'), ('published_by', '163342'), ('updated_at', '2022-08-28 11:43:09.333693'), ('yanked', 'f')]), '0.0.1': OrderedDict([('checksum', '1cb7dccc5e4128b4ebe8c46ca29e440e52bbca4daad5dcea864a74f25dcee0ce'), ('crate_id', '653834'), ('crate_size', '7999'), ('created_at', '2022-09-11 12:08:17.012383'), ('downloads', '12'), ('features', '{}'), ('id', '618675'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1'), ('published_by', '163342'), ('updated_at', '2022-09-11 12:08:17.012383'), ('yanked', 'f')]), '0.0.1-dev.2': OrderedDict([('checksum', '215f42225dffe2a135d1480662d379620445628b4bfe17aee56a20cf0d4590ce'), ('crate_id', '653834'), ('crate_size', '7547'), ('created_at', '2022-08-31 16:02:07.17775'), ('downloads', '23'), ('features', '{}'), ('id', '611995'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.2'), ('published_by', '163342'), ('updated_at', '2022-08-31 16:02:07.17775'), ('yanked', 'f')]), '0.0.1-dev.1': OrderedDict([('checksum', '2d5fb208766898bb8dcf9c2e270143b5b71b6271698d45ac86dc4d3e97ef178e'), ('crate_id', '653834'), ('crate_size', '2940'), ('created_at', '2022-08-29 15:32:43.066959'), ('downloads', '23'), ('features', '{}'), ('id', '610858'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.1'), ('published_by', '163342'), ('updated_at', '2022-08-29 15:32:43.066959'), ('yanked', 'f')])}}
docker-swh-lister-1  | [2022-09-13 14:51:14,662: DEBUG/ForkPoolWorker-1] Listing crates origin completed with last commit id 81cd3beb5d62f3b898607ab5b266a856b0e9fab8
docker-swh-lister-1  | [2022-09-13 14:51:17,965: DEBUG/ForkPoolWorker-1] Successfully removed /tmp/crates.io-index directory
docker-swh-lister-1  | [2022-09-13 14:51:18,058: DEBUG/ForkPoolWorker-1] Successfully removed /tmp/crates.io-db_dump directory
docker-swh-lister-1  | [2022-09-13 14:51:18,066: ERROR/ForkPoolWorker-1] Task swh.lister.crates.tasks.CratesListerTask[d88bb21b-1613-4230-b4ec-5bdd5092982c] raised unexpected: KeyError('0.0.2')
docker-swh-lister-1  | Traceback (most recent call last):
docker-swh-lister-1  |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 451, in trace_task
docker-swh-lister-1  |     R = retval = fun(*args, **kwargs)
docker-swh-lister-1  |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/scheduler/task.py", line 61, in __call__
docker-swh-lister-1  |     result = super().__call__(*args, **kwargs)
docker-swh-lister-1  |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 734, in __protected_call__
docker-swh-lister-1  |     return self.run(*args, **kwargs)
docker-swh-lister-1  |   File "/src/swh-lister/swh/lister/crates/tasks.py", line 14, in list_crates
docker-swh-lister-1  |     return CratesLister.from_configfile(**lister_args).run().dict()
docker-swh-lister-1  |   File "/src/swh-lister/swh/lister/pattern.py", line 127, in run
docker-swh-lister-1  |     for page in self.get_pages():
docker-swh-lister-1  |   File "/src/swh-lister/swh/lister/crates/lister.py", line 245, in get_pages
docker-swh-lister-1  |     entry["version"]
docker-swh-lister-1  | KeyError: '0.0.2'

Working only with the CSV files should guarantee crates data are consistent.

Sep 13 2022, 5:13 PM
franckbret updated the diff for D8455: Pubdev: Ensure we correctly parse and sort release versions.

Add packaging to requirements + some cleanup

Sep 13 2022, 5:07 PM
franckbret added a comment to D8453: user/loaders: Add new loaders in development to loaders info page.

Looks good to me thanks

Sep 13 2022, 3:15 PM
franckbret requested review of D8455: Pubdev: Ensure we correctly parse and sort release versions.
Sep 13 2022, 10:07 AM
franckbret updated subscribers of D8454: Crates.io: Add last_update for each version of a crate.

@vlorentz @ardumont Here is a new version of the crates.io lister which loads csv fies from crates.io database dump.

Sep 13 2022, 8:46 AM
franckbret requested review of D8454: Crates.io: Add last_update for each version of a crate.
Sep 13 2022, 8:34 AM

Sep 8 2022

franckbret added a comment to D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

@franckbret fyi you have updated the wrong diff (pubdev instead of haskell)

Sep 8 2022, 10:16 AM
franckbret updated the diff for D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

Rebase

Sep 8 2022, 10:12 AM
franckbret added a comment to D8407: docker/conf: Declare celery tasks for listers/loaders in development.

Thanks.

fwiw, i think this should be added with diffs by the contributors of those new listers/loaders.

@bchauvet ^ I expected this to be done already.

@franckbret for your information, i must have forgotten to be explicit about this before so ;)

Sep 8 2022, 7:29 AM

Sep 7 2022

franckbret updated the diff for D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

Rebase

Sep 7 2022, 3:28 PM
franckbret updated the diff for D8379: Hackage: Loads Hackage Listed origins.

Rebase

Sep 7 2022, 3:28 PM

Sep 6 2022

franckbret added a comment to D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

You didn't add the test I asked for in D8338#inline-59471

Sep 6 2022, 12:45 PM
franckbret updated the diff for D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

Add pagination tests

Sep 6 2022, 9:40 AM

Sep 5 2022

franckbret added a comment to D8379: Hackage: Loads Hackage Listed origins.

@vlorentz @ardumont Can we merge this one?

Sep 5 2022, 5:16 PM
franckbret added a comment to D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

@ardumont @vlorentz If you don't have other comments or suggestions I think we can merge this one

Sep 5 2022, 5:15 PM
franckbret added a comment to D8379: Hackage: Loads Hackage Listed origins.

I've run the loader tasks for a few hours. no errors, some not_found.

Sep 5 2022, 2:15 PM
franckbret added a comment to D8354: pubdev: Modify origin URL and retrieve package last update.

Hi, looks good for me too. Will test in Docker once its merged.

Sep 5 2022, 11:05 AM
franckbret updated the task description for T4494: ingest hackage.haskell.org, the Haskell package repository .
Sep 5 2022, 10:00 AM · Hackage loader, Hackage lister, Archive coverage
franckbret created T4494: ingest hackage.haskell.org, the Haskell package repository .
Sep 5 2022, 9:55 AM · Hackage loader, Hackage lister, Archive coverage
franckbret updated the task description for T4466: Ingest Aur repository (Arch User Repository).
Sep 5 2022, 9:50 AM · AUR loader, AUR lister, Archive coverage
franckbret updated the task description for T4466: Ingest Aur repository (Arch User Repository).
Sep 5 2022, 9:50 AM · AUR loader, AUR lister, Archive coverage

Sep 2 2022

franckbret updated the diff for D8379: Hackage: Loads Hackage Listed origins.

Fill release date

Sep 2 2022, 5:01 PM
franckbret updated the diff for D8379: Hackage: Loads Hackage Listed origins.

Cabal keys may be Capitalized

Sep 2 2022, 12:40 PM
franckbret updated the diff for D8379: Hackage: Loads Hackage Listed origins.

Ensure we make a json get request adding correct headers

Sep 2 2022, 11:56 AM
franckbret added a comment to D8379: Hackage: Loads Hackage Listed origins.

Found a way:

$ curl https://hackage.haskell.org/package/colors-0.1/revisions/ -H "Accept: application/json"   
[{"number":0,"time":"2013-06-01T13:59:19Z","user":"FumiakiKinoshita"}]
Sep 2 2022, 11:51 AM
franckbret added a comment to D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

Testing Docker with that last commit

Sep 2 2022, 11:13 AM
franckbret added inline comments to D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.
Sep 2 2022, 11:12 AM
franckbret added a comment to D8171: crates: Loader implements incremental mode.

From chatroom:
ardumont
val: ^ currently listers and loaders are not really separated (only the github lister is separated from the rest to avoid starvation around listing forges)
If we want accurate date for both the lister and the loader:
yes, we do so i guess only the option 2 is the way forward

Sep 2 2022, 10:54 AM
franckbret updated the diff for D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

Better implementation of pagination

Sep 2 2022, 10:45 AM
franckbret updated subscribers of D8379: Hackage: Loads Hackage Listed origins.

@ardumont @vlorentz Did not found a way to get a release date per version with this one.

Sep 2 2022, 9:43 AM
franckbret requested review of D8379: Hackage: Loads Hackage Listed origins.
Sep 2 2022, 9:39 AM

Sep 1 2022

franckbret added a comment to D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

Change http api endpoint for search in order to retrieve a last_update

Switch from GET to POST to get results.
Lister is not a single page anymore, each page list 50 origins.

Sep 1 2022, 6:37 PM
franckbret updated the diff for D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

Change http api endpoint for search in order to retrieve a last_update

Sep 1 2022, 5:50 PM

Aug 31 2022

franckbret added a comment to D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

There is an API that provides access to the lastUpload:

$ curl "https://hackage.haskell.org/packages/search" -H "Accept: application/json" -H "Content-Type: application/json" --data '{"page": 0, "sortColumn": "default", "sortDirection": "ascending", "searchQuery": "(deprecated:any)"}' -X POST | jq . | head -n 50 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 23907    0 23806  100   101  40145    170 --:--:-- --:--:-- --:--:-- 40315
{
  "numberOfResults": 16711,
  "pageContents": [
    {
      "description": "Haskell package for easy integration with the 2captcha API.",
      "downloads": 1,
      "lastUpload": "2021-09-09T05:13:30.343509948Z",
      "maintainers": [
        {
          "display": "qwbarch",
          "uri": "/user/qwbarch"
        }
      ],
      "name": {
        "display": "2captcha",
        "uri": "/package/2captcha"
      },
      "tags": [
        {
          "display": "deprecated",
          "uri": "/packages/tag/deprecated"
        },
        {
          "display": "library",
          "uri": "/packages/tag/library"
        },
        {
          "display": "mit",
          "uri": "/packages/tag/mit"
        },
        {
          "display": "network",
          "uri": "/packages/tag/network"
        }
      ],
      "votes": 1.5
    },
    {
      "description": "Examples of 3D graphics programming with OpenGL",
      "downloads": 8,
      "lastUpload": "2016-07-22T14:26:23.038905Z",
      "maintainers": [
        {
          "display": "WolfgangJeltsch",
          "uri": "/user/WolfgangJeltsch"
        }
      ],
      "name": {
        "display": "3d-graphics-examples",
        "uri": "/package/3d-graphics-examples"

You can also use the same API for incremental listing by filtering on lastUpload in the search query.

Thanks, I missed this one

Aug 31 2022, 5:28 PM
franckbret added a comment to D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

There is an API that provides access to the lastUpload:

$ curl "https://hackage.haskell.org/packages/search" -H "Accept: application/json" -H "Content-Type: application/json" --data '{"page": 0, "sortColumn": "default", "sortDirection": "ascending", "searchQuery": "(deprecated:any)"}' -X POST | jq . | head -n 50 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 23907    0 23806  100   101  40145    170 --:--:-- --:--:-- --:--:-- 40315
{
  "numberOfResults": 16711,
  "pageContents": [
    {
      "description": "Haskell package for easy integration with the 2captcha API.",
      "downloads": 1,
      "lastUpload": "2021-09-09T05:13:30.343509948Z",
      "maintainers": [
        {
          "display": "qwbarch",
          "uri": "/user/qwbarch"
        }
      ],
      "name": {
        "display": "2captcha",
        "uri": "/package/2captcha"
      },
      "tags": [
        {
          "display": "deprecated",
          "uri": "/packages/tag/deprecated"
        },
        {
          "display": "library",
          "uri": "/packages/tag/library"
        },
        {
          "display": "mit",
          "uri": "/packages/tag/mit"
        },
        {
          "display": "network",
          "uri": "/packages/tag/network"
        }
      ],
      "votes": 1.5
    },
    {
      "description": "Examples of 3D graphics programming with OpenGL",
      "downloads": 8,
      "lastUpload": "2016-07-22T14:26:23.038905Z",
      "maintainers": [
        {
          "display": "WolfgangJeltsch",
          "uri": "/user/WolfgangJeltsch"
        }
      ],
      "name": {
        "display": "3d-graphics-examples",
        "uri": "/package/3d-graphics-examples"

You can also use the same API for incremental listing by filtering on lastUpload in the search query.

Aug 31 2022, 10:44 AM

Aug 30 2022

franckbret updated subscribers of D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

@ardumont @vlorentz This one is quite simple but like pubdev we do not have access to coherent data to set a last_update. See https://hackage.haskell.org/packages/

Aug 30 2022, 4:26 PM
franckbret retitled D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository from [WIP] Hackage: List origins from hackage.haskell.org, The Haskell Package Repository to Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.
Aug 30 2022, 4:04 PM
franckbret updated the diff for D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.

Remove forgotten and now useless test file, remoove WIP prefix

Aug 30 2022, 4:03 PM
franckbret closed D8339: Arch: Add throttling retry for scrapping and resources download.
Aug 30 2022, 1:52 PM
franckbret committed rDLS0acf5b0f4f50: Arch: Add throttling retry for scrapping and resources download (authored by franckbret).
Arch: Add throttling retry for scrapping and resources download
Aug 30 2022, 1:52 PM
franckbret added a comment to T4466: Ingest Aur repository (Arch User Repository).

I've made a complete run on docker

Aug 30 2022, 1:50 PM · AUR loader, AUR lister, Archive coverage
franckbret updated subscribers of D8339: Arch: Add throttling retry for scrapping and resources download.

@vlorentz @ardumont Can we merge this one. I've tested it in Docker environment and don't get failure anymore. See https://forge.softwareheritage.org/T4233#90006

Aug 30 2022, 1:12 PM
franckbret added a comment to T4233: Ingest Arch Linux.

Arch Linux Lister Docker Report

The lister takes a lot of time and fail on max retries when scraping repository directory (It has run fine a few weeks ago.). Not sure at this point, but I suspect that's a random problem related to network / http server. WIll run it multiple time to see if it failed on the same resource.

By the way I guess that the we need to define a strategy for those exceptions.

Aug 30 2022, 10:28 AM · Arch loader, Arch Lister, Archive coverage
franckbret updated the task description for T4233: Ingest Arch Linux.
Aug 30 2022, 10:14 AM · Arch loader, Arch Lister, Archive coverage
franckbret updated the task description for T4233: Ingest Arch Linux.
Aug 30 2022, 10:06 AM · Arch loader, Arch Lister, Archive coverage
franckbret requested review of D8339: Arch: Add throttling retry for scrapping and resources download.
Aug 30 2022, 10:00 AM
franckbret updated the task description for T4475: Ingest bower.io (Javascript package manager).
Aug 30 2022, 9:07 AM · Bower lister, Archive coverage
franckbret updated the task description for T4475: Ingest bower.io (Javascript package manager).
Aug 30 2022, 8:54 AM · Bower lister, Archive coverage
franckbret added a comment to T4475: Ingest bower.io (Javascript package manager).

I've launched 100 git tasks after running the lister on Docker.
Introspecting logs, not_found are 404 for repositories that do not exists anymore or No valid credentials provided.
Those with status 'failed' are mainly time out.

Aug 30 2022, 8:53 AM · Bower lister, Archive coverage
franckbret updated the task description for T4475: Ingest bower.io (Javascript package manager).
Aug 30 2022, 8:43 AM · Bower lister, Archive coverage
franckbret added a comment to T4475: Ingest bower.io (Javascript package manager).

Lister runs fine on Docker and takes 30sec to list 68864 origins

Aug 30 2022, 8:36 AM · Bower lister, Archive coverage
franckbret updated the task description for T4465: Ingest pub.dev (Dart, Flutter).
Aug 30 2022, 8:14 AM · PubDev loader, PubDev lister, Archive coverage
franckbret added a comment to T4465: Ingest pub.dev (Dart, Flutter).

Loader runs on Docker report

Loader runs fine on docker.
I've first launched 100 and then 1900 loader tasks, it complete in less than an hour.

swh-scheduler=# select count(*) from origin_visit_stats where visit_type='pubdev' and last_visit_status='successful';
 count 
-------
  1450

swh-scheduler=# select count(*) from origin_visit_stats where visit_type='pubdev' and last_visit_status='failed';
 count 
-------
   550
Aug 30 2022, 8:14 AM · PubDev loader, PubDev lister, Archive coverage

Aug 29 2022

franckbret requested review of D8338: Hackage: List origins from hackage.haskell.org, The Haskell Package Repository.
Aug 29 2022, 7:02 PM
franckbret closed D8336: Bower: Set VISIT_TYPE as 'git'.
Aug 29 2022, 5:36 PM
franckbret committed rDLSb7b11887a0c4: Bower: Set VISIT_TYPE as 'git' (authored by franckbret).
Bower: Set VISIT_TYPE as 'git'
Aug 29 2022, 5:36 PM
franckbret created T4475: Ingest bower.io (Javascript package manager).
Aug 29 2022, 5:26 PM · Bower lister, Archive coverage
franckbret requested review of D8336: Bower: Set VISIT_TYPE as 'git'.
Aug 29 2022, 5:24 PM
franckbret closed D8333: Bower: List origins from registry.bower.io.
Aug 29 2022, 4:45 PM
franckbret committed rDLSceae8c42b52c: Bower: List origins from registry.bower.io (authored by franckbret).
Bower: List origins from registry.bower.io
Aug 29 2022, 4:45 PM
franckbret requested review of D8333: Bower: List origins from registry.bower.io.
Aug 29 2022, 4:04 PM
franckbret updated the task description for T4465: Ingest pub.dev (Dart, Flutter).
Aug 29 2022, 12:19 PM · PubDev loader, PubDev lister, Archive coverage
franckbret closed D8330: Pubdev: Fix get_versions() failed with loose version names.
Aug 29 2022, 12:18 PM
franckbret committed rDLDBASE3ff24fb3b80e: Pubdev: Fix get_versions() failed with loose version names (authored by franckbret).
Pubdev: Fix get_versions() failed with loose version names
Aug 29 2022, 12:18 PM
franckbret requested review of D8330: Pubdev: Fix get_versions() failed with loose version names.
Aug 29 2022, 12:08 PM

Aug 26 2022

franckbret updated the task description for T4233: Ingest Arch Linux.
Aug 26 2022, 5:21 PM · Arch loader, Arch Lister, Archive coverage
franckbret updated the task description for T4233: Ingest Arch Linux.
Aug 26 2022, 5:20 PM · Arch loader, Arch Lister, Archive coverage
franckbret added a comment to T4233: Ingest Arch Linux.

Arch Linux Lister Docker Report

Aug 26 2022, 5:11 PM · Arch loader, Arch Lister, Archive coverage
franckbret added a comment to T4466: Ingest Aur repository (Arch User Repository).

Aur Loader runs in Docker report

Aug 26 2022, 4:57 PM · AUR loader, AUR lister, Archive coverage
franckbret updated the task description for T4466: Ingest Aur repository (Arch User Repository).
Aug 26 2022, 4:34 PM · AUR loader, AUR lister, Archive coverage
franckbret added a comment to T4466: Ingest Aur repository (Arch User Repository).

AUR Lister runs in Docker report

Aug 26 2022, 4:33 PM · AUR loader, AUR lister, Archive coverage
franckbret created T4466: Ingest Aur repository (Arch User Repository).
Aug 26 2022, 4:30 PM · AUR loader, AUR lister, Archive coverage
franckbret updated the task description for T4465: Ingest pub.dev (Dart, Flutter).
Aug 26 2022, 4:10 PM · PubDev loader, PubDev lister, Archive coverage
franckbret updated the task description for T4465: Ingest pub.dev (Dart, Flutter).
Aug 26 2022, 4:06 PM · PubDev loader, PubDev lister, Archive coverage
franckbret updated the task description for T4465: Ingest pub.dev (Dart, Flutter).
Aug 26 2022, 4:05 PM · PubDev loader, PubDev lister, Archive coverage
franckbret updated the task description for T4465: Ingest pub.dev (Dart, Flutter).
Aug 26 2022, 4:04 PM · PubDev loader, PubDev lister, Archive coverage
franckbret added a comment to T4465: Ingest pub.dev (Dart, Flutter).

Loader runs on Docker report

Aug 26 2022, 4:03 PM · PubDev loader, PubDev lister, Archive coverage
franckbret added a comment to T4465: Ingest pub.dev (Dart, Flutter).

Lister runs on Docker report

Aug 26 2022, 3:35 PM · PubDev loader, PubDev lister, Archive coverage
franckbret created T4465: Ingest pub.dev (Dart, Flutter).
Aug 26 2022, 3:31 PM · PubDev loader, PubDev lister, Archive coverage
franckbret closed D8321: Pubdev, Change name of task method.
Aug 26 2022, 3:16 PM