Make use of http_retry instead of throttling_retry decorator after D8519
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Sep 26 2022
Sep 23 2022
Some test report after running the lister (lister only, no loader yet) with following command:
Doc: Add missing quote to docker command example to add a conda task
Add missing documentation link
Add documentation for lister usage
Sep 22 2022
Lister improvment
Sep 21 2022
Sep 19 2022
Update the patch to make it work with new lister patch that give a last_update value for each versions
Sep 16 2022
Update documentation
Sep 15 2022
In D8454#219825, @anlambert wrote:Also I think it can be doable to totally remove the GIT part of the lister. The csv files have everything we need. For the incremental part a metadata.json file at the root of the archive a date and a commit hash that represents the date of the db dump.
In incremental case the lister can compare that date to the last update of the crate.I also think getting rid of the git part would be a good idea.
By testing that diff in docker, I quickly got an error as the git repository contains more recent crate versions
as those extracted from the db dump, see below (ftr, I added some debug logs):Working only with the CSV files should guarantee crates data are consistent.
In D8454#220070, @vlorentz wrote:In D8454#219842, @franckbret wrote:Didn't test this one yet on docker but guessed this situation. The backup is generated everyday or so, the git repo changes everyday.
Do they document at what time of day it is generated? Would be nice to run the lister right after to minimize lag
Use csv listing only
Sep 14 2022
Sep 13 2022
In D8454#219825, @anlambert wrote:Also I think it can be doable to totally remove the GIT part of the lister. The csv files have everything we need. For the incremental part a metadata.json file at the root of the archive a date and a commit hash that represents the date of the db dump.
In incremental case the lister can compare that date to the last update of the crate.I also think getting rid of the git part would be a good idea.
By testing that diff in docker, I quickly got an error as the git repository contains more recent crate versions
as those extracted from the db dump, see below (ftr, I added some debug logs):docker-swh-lister-1 | [2022-09-13 14:48:59,849: INFO/MainProcess] Task swh.lister.crates.tasks.CratesListerTask[d88bb21b-1613-4230-b4ec-5bdd5092982c] received docker-swh-lister-1 | [2022-09-13 14:48:59,851: DEBUG/ForkPoolWorker-1] Loading config file /lister.yml docker-swh-lister-1 | Enumerating objects: 158660, done. Counting objects: 100% (1216/1216), done. 0% (1/1216) Compressing objects: 100% (601/601), done.: 0% (1/601) docker-swh-lister-1 | Total 158660 (delta 715), reused 1090 (delta 589), pack-reused 157444 docker-swh-lister-1 | [2022-09-13 14:51:14,659: DEBUG/ForkPoolWorker-1] Found 25 crates in crates_index docker-swh-lister-1 | [2022-09-13 14:51:14,660: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'vers': '0.0.1-dev.0', 'deps': [{'name': 'hex', 'req': '^0.4.3', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'normal'}, {'name': 'rand', 'req': '^0.8.5', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'normal'}], 'cksum': '878b6701a5ab722ef3c30f2af1a25539c50d83c97da98941998c684b0f5c52cd', 'features': {}, 'yanked': False, 'links': None} docker-swh-lister-1 | [2022-09-13 14:51:14,660: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'updated_at': '2022-09-11 12:08:17.012383', 'versions': {'0.0.1-dev.0': OrderedDict([('checksum', '878b6701a5ab722ef3c30f2af1a25539c50d83c97da98941998c684b0f5c52cd'), ('crate_id', '653834'), ('crate_size', '2821'), ('created_at', '2022-08-28 11:43:09.333693'), ('downloads', '22'), ('features', '{}'), ('id', '610177'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.0'), ('published_by', '163342'), ('updated_at', '2022-08-28 11:43:09.333693'), ('yanked', 'f')]), '0.0.1': OrderedDict([('checksum', '1cb7dccc5e4128b4ebe8c46ca29e440e52bbca4daad5dcea864a74f25dcee0ce'), ('crate_id', '653834'), ('crate_size', '7999'), ('created_at', '2022-09-11 12:08:17.012383'), ('downloads', '12'), ('features', '{}'), ('id', '618675'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1'), ('published_by', '163342'), ('updated_at', '2022-09-11 12:08:17.012383'), ('yanked', 'f')]), '0.0.1-dev.2': OrderedDict([('checksum', '215f42225dffe2a135d1480662d379620445628b4bfe17aee56a20cf0d4590ce'), ('crate_id', '653834'), ('crate_size', '7547'), ('created_at', '2022-08-31 16:02:07.17775'), ('downloads', '23'), ('features', '{}'), ('id', '611995'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.2'), ('published_by', '163342'), ('updated_at', '2022-08-31 16:02:07.17775'), ('yanked', 'f')]), '0.0.1-dev.1': OrderedDict([('checksum', '2d5fb208766898bb8dcf9c2e270143b5b71b6271698d45ac86dc4d3e97ef178e'), ('crate_id', '653834'), ('crate_size', '2940'), ('created_at', '2022-08-29 15:32:43.066959'), ('downloads', '23'), ('features', '{}'), ('id', '610858'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.1'), ('published_by', '163342'), ('updated_at', '2022-08-29 15:32:43.066959'), ('yanked', 'f')])}} docker-swh-lister-1 | [2022-09-13 14:51:14,660: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'vers': '0.0.1-dev.1', 'deps': [{'name': 'hex', 'req': '^0.4.3', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'normal'}, {'name': 'rand', 'req': '^0.8.5', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'normal'}], 'cksum': '2d5fb208766898bb8dcf9c2e270143b5b71b6271698d45ac86dc4d3e97ef178e', 'features': {}, 'yanked': False, 'links': None} docker-swh-lister-1 | [2022-09-13 14:51:14,660: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'updated_at': '2022-09-11 12:08:17.012383', 'versions': {'0.0.1-dev.0': OrderedDict([('checksum', '878b6701a5ab722ef3c30f2af1a25539c50d83c97da98941998c684b0f5c52cd'), ('crate_id', '653834'), ('crate_size', '2821'), ('created_at', '2022-08-28 11:43:09.333693'), ('downloads', '22'), ('features', '{}'), ('id', '610177'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.0'), ('published_by', '163342'), ('updated_at', '2022-08-28 11:43:09.333693'), ('yanked', 'f')]), '0.0.1': OrderedDict([('checksum', '1cb7dccc5e4128b4ebe8c46ca29e440e52bbca4daad5dcea864a74f25dcee0ce'), ('crate_id', '653834'), ('crate_size', '7999'), ('created_at', '2022-09-11 12:08:17.012383'), ('downloads', '12'), ('features', '{}'), ('id', '618675'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1'), ('published_by', '163342'), ('updated_at', '2022-09-11 12:08:17.012383'), ('yanked', 'f')]), '0.0.1-dev.2': OrderedDict([('checksum', '215f42225dffe2a135d1480662d379620445628b4bfe17aee56a20cf0d4590ce'), ('crate_id', '653834'), ('crate_size', '7547'), ('created_at', '2022-08-31 16:02:07.17775'), ('downloads', '23'), ('features', '{}'), ('id', '611995'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.2'), ('published_by', '163342'), ('updated_at', '2022-08-31 16:02:07.17775'), ('yanked', 'f')]), '0.0.1-dev.1': OrderedDict([('checksum', '2d5fb208766898bb8dcf9c2e270143b5b71b6271698d45ac86dc4d3e97ef178e'), ('crate_id', '653834'), ('crate_size', '2940'), ('created_at', '2022-08-29 15:32:43.066959'), ('downloads', '23'), ('features', '{}'), ('id', '610858'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.1'), ('published_by', '163342'), ('updated_at', '2022-08-29 15:32:43.066959'), ('yanked', 'f')])}} docker-swh-lister-1 | [2022-09-13 14:51:14,661: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'vers': '0.0.1-dev.2', 'deps': [{'name': 'getrandom', 'req': '^0.2', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'normal'}, {'name': 'criterion', 'req': '^0.3', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}, {'name': 'nanoid', 'req': '^0.4.0', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}, {'name': 'uuid', 'req': '^1.1.2', 'features': ['v4', 'rng'], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}], 'cksum': '215f42225dffe2a135d1480662d379620445628b4bfe17aee56a20cf0d4590ce', 'features': {}, 'yanked': False, 'links': None} docker-swh-lister-1 | [2022-09-13 14:51:14,661: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'updated_at': '2022-09-11 12:08:17.012383', 'versions': {'0.0.1-dev.0': OrderedDict([('checksum', '878b6701a5ab722ef3c30f2af1a25539c50d83c97da98941998c684b0f5c52cd'), ('crate_id', '653834'), ('crate_size', '2821'), ('created_at', '2022-08-28 11:43:09.333693'), ('downloads', '22'), ('features', '{}'), ('id', '610177'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.0'), ('published_by', '163342'), ('updated_at', '2022-08-28 11:43:09.333693'), ('yanked', 'f')]), '0.0.1': OrderedDict([('checksum', '1cb7dccc5e4128b4ebe8c46ca29e440e52bbca4daad5dcea864a74f25dcee0ce'), ('crate_id', '653834'), ('crate_size', '7999'), ('created_at', '2022-09-11 12:08:17.012383'), ('downloads', '12'), ('features', '{}'), ('id', '618675'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1'), ('published_by', '163342'), ('updated_at', '2022-09-11 12:08:17.012383'), ('yanked', 'f')]), '0.0.1-dev.2': OrderedDict([('checksum', '215f42225dffe2a135d1480662d379620445628b4bfe17aee56a20cf0d4590ce'), ('crate_id', '653834'), ('crate_size', '7547'), ('created_at', '2022-08-31 16:02:07.17775'), ('downloads', '23'), ('features', '{}'), ('id', '611995'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.2'), ('published_by', '163342'), ('updated_at', '2022-08-31 16:02:07.17775'), ('yanked', 'f')]), '0.0.1-dev.1': OrderedDict([('checksum', '2d5fb208766898bb8dcf9c2e270143b5b71b6271698d45ac86dc4d3e97ef178e'), ('crate_id', '653834'), ('crate_size', '2940'), ('created_at', '2022-08-29 15:32:43.066959'), ('downloads', '23'), ('features', '{}'), ('id', '610858'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.1'), ('published_by', '163342'), ('updated_at', '2022-08-29 15:32:43.066959'), ('yanked', 'f')])}} docker-swh-lister-1 | [2022-09-13 14:51:14,661: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'vers': '0.0.1', 'deps': [{'name': 'getrandom', 'req': '^0.2', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'normal'}, {'name': 'criterion', 'req': '^0.4.0', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}, {'name': 'nanoid', 'req': '^0.4.0', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}, {'name': 'uuid', 'req': '^1.1.2', 'features': ['v4', 'rng'], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}], 'cksum': '1cb7dccc5e4128b4ebe8c46ca29e440e52bbca4daad5dcea864a74f25dcee0ce', 'features': {}, 'yanked': False, 'links': None} docker-swh-lister-1 | [2022-09-13 14:51:14,661: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'updated_at': '2022-09-11 12:08:17.012383', 'versions': {'0.0.1-dev.0': OrderedDict([('checksum', '878b6701a5ab722ef3c30f2af1a25539c50d83c97da98941998c684b0f5c52cd'), ('crate_id', '653834'), ('crate_size', '2821'), ('created_at', '2022-08-28 11:43:09.333693'), ('downloads', '22'), ('features', '{}'), ('id', '610177'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.0'), ('published_by', '163342'), ('updated_at', '2022-08-28 11:43:09.333693'), ('yanked', 'f')]), '0.0.1': OrderedDict([('checksum', '1cb7dccc5e4128b4ebe8c46ca29e440e52bbca4daad5dcea864a74f25dcee0ce'), ('crate_id', '653834'), ('crate_size', '7999'), ('created_at', '2022-09-11 12:08:17.012383'), ('downloads', '12'), ('features', '{}'), ('id', '618675'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1'), ('published_by', '163342'), ('updated_at', '2022-09-11 12:08:17.012383'), ('yanked', 'f')]), '0.0.1-dev.2': OrderedDict([('checksum', '215f42225dffe2a135d1480662d379620445628b4bfe17aee56a20cf0d4590ce'), ('crate_id', '653834'), ('crate_size', '7547'), ('created_at', '2022-08-31 16:02:07.17775'), ('downloads', '23'), ('features', '{}'), ('id', '611995'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.2'), ('published_by', '163342'), ('updated_at', '2022-08-31 16:02:07.17775'), ('yanked', 'f')]), '0.0.1-dev.1': OrderedDict([('checksum', '2d5fb208766898bb8dcf9c2e270143b5b71b6271698d45ac86dc4d3e97ef178e'), ('crate_id', '653834'), ('crate_size', '2940'), ('created_at', '2022-08-29 15:32:43.066959'), ('downloads', '23'), ('features', '{}'), ('id', '610858'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.1'), ('published_by', '163342'), ('updated_at', '2022-08-29 15:32:43.066959'), ('yanked', 'f')])}} docker-swh-lister-1 | [2022-09-13 14:51:14,661: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'vers': '0.0.2', 'deps': [{'name': 'getrandom', 'req': '^0.2', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'normal'}, {'name': 'criterion', 'req': '^0.4.0', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}, {'name': 'nanoid', 'req': '^0.4.0', 'features': [], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}, {'name': 'uuid', 'req': '^1.1.2', 'features': ['v4', 'rng'], 'optional': False, 'default_features': True, 'target': None, 'kind': 'dev'}], 'cksum': 'c25097f191e32ad6550e402f6c5e6fbae7115a60bfedea2a4f5351c16a286229', 'features': {}, 'yanked': False, 'links': None} docker-swh-lister-1 | [2022-09-13 14:51:14,661: DEBUG/ForkPoolWorker-1] {'name': 'colorid', 'updated_at': '2022-09-11 12:08:17.012383', 'versions': {'0.0.1-dev.0': OrderedDict([('checksum', '878b6701a5ab722ef3c30f2af1a25539c50d83c97da98941998c684b0f5c52cd'), ('crate_id', '653834'), ('crate_size', '2821'), ('created_at', '2022-08-28 11:43:09.333693'), ('downloads', '22'), ('features', '{}'), ('id', '610177'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.0'), ('published_by', '163342'), ('updated_at', '2022-08-28 11:43:09.333693'), ('yanked', 'f')]), '0.0.1': OrderedDict([('checksum', '1cb7dccc5e4128b4ebe8c46ca29e440e52bbca4daad5dcea864a74f25dcee0ce'), ('crate_id', '653834'), ('crate_size', '7999'), ('created_at', '2022-09-11 12:08:17.012383'), ('downloads', '12'), ('features', '{}'), ('id', '618675'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1'), ('published_by', '163342'), ('updated_at', '2022-09-11 12:08:17.012383'), ('yanked', 'f')]), '0.0.1-dev.2': OrderedDict([('checksum', '215f42225dffe2a135d1480662d379620445628b4bfe17aee56a20cf0d4590ce'), ('crate_id', '653834'), ('crate_size', '7547'), ('created_at', '2022-08-31 16:02:07.17775'), ('downloads', '23'), ('features', '{}'), ('id', '611995'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.2'), ('published_by', '163342'), ('updated_at', '2022-08-31 16:02:07.17775'), ('yanked', 'f')]), '0.0.1-dev.1': OrderedDict([('checksum', '2d5fb208766898bb8dcf9c2e270143b5b71b6271698d45ac86dc4d3e97ef178e'), ('crate_id', '653834'), ('crate_size', '2940'), ('created_at', '2022-08-29 15:32:43.066959'), ('downloads', '23'), ('features', '{}'), ('id', '610858'), ('license', 'MIT'), ('links', ''), ('num', '0.0.1-dev.1'), ('published_by', '163342'), ('updated_at', '2022-08-29 15:32:43.066959'), ('yanked', 'f')])}} docker-swh-lister-1 | [2022-09-13 14:51:14,662: DEBUG/ForkPoolWorker-1] Listing crates origin completed with last commit id 81cd3beb5d62f3b898607ab5b266a856b0e9fab8 docker-swh-lister-1 | [2022-09-13 14:51:17,965: DEBUG/ForkPoolWorker-1] Successfully removed /tmp/crates.io-index directory docker-swh-lister-1 | [2022-09-13 14:51:18,058: DEBUG/ForkPoolWorker-1] Successfully removed /tmp/crates.io-db_dump directory docker-swh-lister-1 | [2022-09-13 14:51:18,066: ERROR/ForkPoolWorker-1] Task swh.lister.crates.tasks.CratesListerTask[d88bb21b-1613-4230-b4ec-5bdd5092982c] raised unexpected: KeyError('0.0.2') docker-swh-lister-1 | Traceback (most recent call last): docker-swh-lister-1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 451, in trace_task docker-swh-lister-1 | R = retval = fun(*args, **kwargs) docker-swh-lister-1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/scheduler/task.py", line 61, in __call__ docker-swh-lister-1 | result = super().__call__(*args, **kwargs) docker-swh-lister-1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 734, in __protected_call__ docker-swh-lister-1 | return self.run(*args, **kwargs) docker-swh-lister-1 | File "/src/swh-lister/swh/lister/crates/tasks.py", line 14, in list_crates docker-swh-lister-1 | return CratesLister.from_configfile(**lister_args).run().dict() docker-swh-lister-1 | File "/src/swh-lister/swh/lister/pattern.py", line 127, in run docker-swh-lister-1 | for page in self.get_pages(): docker-swh-lister-1 | File "/src/swh-lister/swh/lister/crates/lister.py", line 245, in get_pages docker-swh-lister-1 | entry["version"] docker-swh-lister-1 | KeyError: '0.0.2'Working only with the CSV files should guarantee crates data are consistent.
Add packaging to requirements + some cleanup
Looks good to me thanks
Sep 8 2022
In D8338#218632, @anlambert wrote:@franckbret fyi you have updated the wrong diff (pubdev instead of haskell)
Rebase
In D8407#218601, @ardumont wrote:Thanks.
fwiw, i think this should be added with diffs by the contributors of those new listers/loaders.
@bchauvet ^ I expected this to be done already.
@franckbret for your information, i must have forgotten to be explicit about this before so ;)
Sep 7 2022
Rebase
Rebase
Sep 6 2022
In D8338#218211, @vlorentz wrote:You didn't add the test I asked for in D8338#inline-59471
Add pagination tests
Sep 5 2022
I've run the loader tasks for a few hours. no errors, some not_found.
Hi, looks good for me too. Will test in Docker once its merged.
Sep 2 2022
Fill release date
Cabal keys may be Capitalized
Ensure we make a json get request adding correct headers
In D8379#217877, @vlorentz wrote:Found a way:
$ curl https://hackage.haskell.org/package/colors-0.1/revisions/ -H "Accept: application/json" [{"number":0,"time":"2013-06-01T13:59:19Z","user":"FumiakiKinoshita"}]
Testing Docker with that last commit
From chatroom:
ardumont
val: ^ currently listers and loaders are not really separated (only the github lister is separated from the rest to avoid starvation around listing forges)
If we want accurate date for both the lister and the loader:
yes, we do so i guess only the option 2 is the way forward
Better implementation of pagination
Sep 1 2022
In D8338#217780, @franckbret wrote:Change http api endpoint for search in order to retrieve a last_update
Switch from GET to POST to get results.
Lister is not a single page anymore, each page list 50 origins.
Change http api endpoint for search in order to retrieve a last_update
Aug 31 2022
In D8338#217214, @franckbret wrote:In D8338#217185, @vlorentz wrote:There is an API that provides access to the lastUpload:
$ curl "https://hackage.haskell.org/packages/search" -H "Accept: application/json" -H "Content-Type: application/json" --data '{"page": 0, "sortColumn": "default", "sortDirection": "ascending", "searchQuery": "(deprecated:any)"}' -X POST | jq . | head -n 50 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 23907 0 23806 100 101 40145 170 --:--:-- --:--:-- --:--:-- 40315 { "numberOfResults": 16711, "pageContents": [ { "description": "Haskell package for easy integration with the 2captcha API.", "downloads": 1, "lastUpload": "2021-09-09T05:13:30.343509948Z", "maintainers": [ { "display": "qwbarch", "uri": "/user/qwbarch" } ], "name": { "display": "2captcha", "uri": "/package/2captcha" }, "tags": [ { "display": "deprecated", "uri": "/packages/tag/deprecated" }, { "display": "library", "uri": "/packages/tag/library" }, { "display": "mit", "uri": "/packages/tag/mit" }, { "display": "network", "uri": "/packages/tag/network" } ], "votes": 1.5 }, { "description": "Examples of 3D graphics programming with OpenGL", "downloads": 8, "lastUpload": "2016-07-22T14:26:23.038905Z", "maintainers": [ { "display": "WolfgangJeltsch", "uri": "/user/WolfgangJeltsch" } ], "name": { "display": "3d-graphics-examples", "uri": "/package/3d-graphics-examples"You can also use the same API for incremental listing by filtering on lastUpload in the search query.
Thanks, I missed this one
In D8338#217185, @vlorentz wrote:There is an API that provides access to the lastUpload:
$ curl "https://hackage.haskell.org/packages/search" -H "Accept: application/json" -H "Content-Type: application/json" --data '{"page": 0, "sortColumn": "default", "sortDirection": "ascending", "searchQuery": "(deprecated:any)"}' -X POST | jq . | head -n 50 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 23907 0 23806 100 101 40145 170 --:--:-- --:--:-- --:--:-- 40315 { "numberOfResults": 16711, "pageContents": [ { "description": "Haskell package for easy integration with the 2captcha API.", "downloads": 1, "lastUpload": "2021-09-09T05:13:30.343509948Z", "maintainers": [ { "display": "qwbarch", "uri": "/user/qwbarch" } ], "name": { "display": "2captcha", "uri": "/package/2captcha" }, "tags": [ { "display": "deprecated", "uri": "/packages/tag/deprecated" }, { "display": "library", "uri": "/packages/tag/library" }, { "display": "mit", "uri": "/packages/tag/mit" }, { "display": "network", "uri": "/packages/tag/network" } ], "votes": 1.5 }, { "description": "Examples of 3D graphics programming with OpenGL", "downloads": 8, "lastUpload": "2016-07-22T14:26:23.038905Z", "maintainers": [ { "display": "WolfgangJeltsch", "uri": "/user/WolfgangJeltsch" } ], "name": { "display": "3d-graphics-examples", "uri": "/package/3d-graphics-examples"You can also use the same API for incremental listing by filtering on lastUpload in the search query.
Aug 30 2022
@ardumont @vlorentz This one is quite simple but like pubdev we do not have access to coherent data to set a last_update. See https://hackage.haskell.org/packages/
Remove forgotten and now useless test file, remoove WIP prefix
I've made a complete run on docker
@vlorentz @ardumont Can we merge this one. I've tested it in Docker environment and don't get failure anymore. See https://forge.softwareheritage.org/T4233#90006
In T4233#89838, @franckbret wrote:Arch Linux Lister Docker Report
The lister takes a lot of time and fail on max retries when scraping repository directory (It has run fine a few weeks ago.). Not sure at this point, but I suspect that's a random problem related to network / http server. WIll run it multiple time to see if it failed on the same resource.
By the way I guess that the we need to define a strategy for those exceptions.
I've launched 100 git tasks after running the lister on Docker.
Introspecting logs, not_found are 404 for repositories that do not exists anymore or No valid credentials provided.
Those with status 'failed' are mainly time out.
Lister runs fine on Docker and takes 30sec to list 68864 origins
In T4465#89812, @franckbret wrote:Loader runs on Docker report
Loader runs fine on docker.
I've first launched 100 and then 1900 loader tasks, it complete in less than an hour.swh-scheduler=# select count(*) from origin_visit_stats where visit_type='pubdev' and last_visit_status='successful'; count ------- 1450 swh-scheduler=# select count(*) from origin_visit_stats where visit_type='pubdev' and last_visit_status='failed'; count ------- 550
Aug 29 2022
Aug 26 2022
Arch Linux Lister Docker Report
Aur Loader runs in Docker report
AUR Lister runs in Docker report
Loader runs on Docker report
Lister runs on Docker report