Abandoning this as we must use that format for artifacts metadata apparently.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Sep 28 2022
Sep 27 2022
In D8553#222288, @vlorentz wrote:Although, as using extra_loader_arguments as an escape hatch to pass lists of artifacts is starting to spread across lister/loader pairs (GNU, Maven, Arch, AUR, CRAN, Crates, and maybe OPAM), I would be fine with defining a new way to pass these objects (as well as arch_metadata/aur_metadata/crates_metadata?), but it should be done for all these loaders and using the same format.
(This will also make it easier to change the storage of these from large JSON objects to rows in the scheduler DB in the future, if their size becomes an issue for postgresql or celery; but not now)
My point is that we should not create dozens of undocumented ad-hoc formats in the scheduler database just to remove two lines of code from each loader
In D8553#222284, @vlorentz wrote:It avoids the proliferation of ad-hoc formats
In D8553#222275, @anlambert wrote:In D8553#222234, @vlorentz wrote:They intentionally use the same format as https://docs.softwareheritage.org/devel/swh-storage/extrinsic-metadata-specification.html#original-artifacts-json
And so ? This makes the loader simpler and has no effect on what we archive, so better using dicts here.
In D8553#222234, @vlorentz wrote:They intentionally use the same format as https://docs.softwareheritage.org/devel/swh-storage/extrinsic-metadata-specification.html#original-artifacts-json
Add cpan lister queue
In D8542#222044, @franckbret wrote:In D8542#221864, @anlambert wrote:@franckbret , have you considered exploiting the https://fastapi.metacpan.org/v1/release/_search endpoint of the CPAN elasticsearch ?
It seems to list all CPAN releases with dates, links to tarballs and checksums. You could build a list of artifacts for each package as in the crates loader
and pass them as loader arguments.Thanks for the review.
Yes I have checked this one too. When exploring CPAN I found a lot of inconsistencies when parsing data, so i've gone a simplier way.
The main cons is that /release/ returns a lot of useless data and a big amount of records, the idea is that the loader will load related versions via https://fastapi.metacpan.org/v1/release/versions/{pkgname}Let's talk about this next week.
The loader requires some adaptation to align with lister output.
In D8535#221927, @ardumont wrote:Are those new listers landed yet?
Not yet but they will be soon and it allows to test them in docker environment.
ok. If we land this now and someone starts from scratch or rebuild its local dev, that will break though.
@franckbret, I added the improvements for the crates lister we discussed last week as inline comments.
@franckbret , have you considered exploiting the https://fastapi.metacpan.org/v1/release/_search endpoint of the CPAN elasticsearch ?
In D8535#221690, @ardumont wrote:Are those new listers landed yet?
Looks good to me !
Looks good to me, thanks !
Sep 26 2022
Overall looks good to me, just a couple of inline comments to handle.
Still some changes to bring to that diff now HTTP requests and user-agent setting codes have been deduplicated.
That diff requires some changes as the api_info function got renamed (see inline comments).
I think it is better to reuse the "Save again" button from code tab.
Rebase
Rebase
Rebase
Sep 22 2022
Sep 21 2022
Overall, looks god to me but there is still some rooms for a couple of improvements (see inline comments).
Update doctring
Nevertheless, they are less origins in the scheduler db, maybe some get listed more than once, need to check.
LGTM
Just tested the lister in docker and I got a lot more origins listed:
docker-swh-lister-1 | [2022-09-21 12:24:57,583: INFO/MainProcess] Task swh.lister.gogs.tasks.FullGogsRelister[2b0ba017-4917-44dc-9511-e860bb431322] received docker-swh-lister-1 | [2022-09-21 12:24:57,584: DEBUG/ForkPoolWorker-1] Loading config file /lister.yml docker-swh-lister-1 | [2022-09-21 12:24:57,594: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search with params {'limit': 50, 'page': 1} docker-swh-lister-1 | [2022-09-21 12:24:58,556: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=2 with params {} docker-swh-lister-1 | [2022-09-21 12:24:58,791: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=3 with params {} docker-swh-lister-1 | [2022-09-21 12:24:59,018: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=4 with params {} docker-swh-lister-1 | [2022-09-21 12:24:59,263: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=5 with params {} docker-swh-lister-1 | [2022-09-21 12:24:59,580: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=6 with params {} docker-swh-lister-1 | [2022-09-21 12:24:59,845: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=7 with params {} docker-swh-lister-1 | [2022-09-21 12:25:00,094: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=8 with params {} docker-swh-lister-1 | [2022-09-21 12:25:00,323: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=9 with params {} docker-swh-lister-1 | [2022-09-21 12:25:00,560: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=10 with params {} docker-swh-lister-1 | [2022-09-21 12:25:00,787: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=11 with params {} docker-swh-lister-1 | [2022-09-21 12:25:01,020: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=12 with params {} docker-swh-lister-1 | [2022-09-21 12:25:01,324: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=13 with params {} docker-swh-lister-1 | [2022-09-21 12:25:01,794: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=14 with params {} docker-swh-lister-1 | [2022-09-21 12:25:02,191: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=15 with params {} docker-swh-lister-1 | [2022-09-21 12:25:02,549: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=16 with params {} docker-swh-lister-1 | [2022-09-21 12:25:02,967: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=17 with params {} docker-swh-lister-1 | [2022-09-21 12:25:03,250: WARNING/ForkPoolWorker-1] Unexpected HTTP status code 500 on https://try.gogs.io/api/v1/repos/search?page=17: b'' docker-swh-lister-1 | [2022-09-21 12:25:03,250: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=18 with params {} docker-swh-lister-1 | [2022-09-21 12:25:03,515: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=19 with params {} docker-swh-lister-1 | [2022-09-21 12:25:03,765: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=20 with params {} docker-swh-lister-1 | [2022-09-21 12:25:04,078: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=21 with params {} docker-swh-lister-1 | [2022-09-21 12:25:04,444: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=22 with params {} docker-swh-lister-1 | [2022-09-21 12:25:04,695: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=23 with params {} docker-swh-lister-1 | [2022-09-21 12:25:04,948: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=24 with params {} docker-swh-lister-1 | [2022-09-21 12:25:05,262: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=25 with params {} docker-swh-lister-1 | [2022-09-21 12:25:05,814: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=26 with params {} docker-swh-lister-1 | [2022-09-21 12:25:06,184: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=27 with params {} docker-swh-lister-1 | [2022-09-21 12:25:06,454: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=28 with params {} docker-swh-lister-1 | [2022-09-21 12:25:06,688: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=29 with params {} docker-swh-lister-1 | [2022-09-21 12:25:06,935: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=30 with params {} docker-swh-lister-1 | [2022-09-21 12:25:07,247: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=31 with params {} docker-swh-lister-1 | [2022-09-21 12:25:07,500: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=32 with params {} docker-swh-lister-1 | [2022-09-21 12:25:07,734: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=33 with params {} docker-swh-lister-1 | [2022-09-21 12:25:08,064: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=34 with params {} docker-swh-lister-1 | [2022-09-21 12:25:08,337: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=35 with params {} docker-swh-lister-1 | [2022-09-21 12:25:08,581: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=36 with params {} docker-swh-lister-1 | [2022-09-21 12:25:08,818: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=37 with params {} docker-swh-lister-1 | [2022-09-21 12:25:09,057: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=38 with params {} docker-swh-lister-1 | [2022-09-21 12:25:09,289: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=39 with params {} docker-swh-lister-1 | [2022-09-21 12:25:09,531: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=40 with params {} docker-swh-lister-1 | [2022-09-21 12:25:09,756: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=41 with params {} docker-swh-lister-1 | [2022-09-21 12:25:09,999: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=42 with params {} docker-swh-lister-1 | [2022-09-21 12:25:10,213: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=43 with params {} docker-swh-lister-1 | [2022-09-21 12:25:10,444: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=44 with params {} docker-swh-lister-1 | [2022-09-21 12:25:10,689: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=45 with params {} docker-swh-lister-1 | [2022-09-21 12:25:10,914: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=46 with params {} docker-swh-lister-1 | [2022-09-21 12:25:11,163: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=47 with params {} docker-swh-lister-1 | [2022-09-21 12:25:11,390: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=48 with params {} docker-swh-lister-1 | [2022-09-21 12:25:11,710: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=49 with params {} docker-swh-lister-1 | [2022-09-21 12:25:12,068: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=50 with params {} docker-swh-lister-1 | [2022-09-21 12:25:12,342: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=51 with params {} docker-swh-lister-1 | [2022-09-21 12:25:12,619: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=52 with params {} docker-swh-lister-1 | [2022-09-21 12:25:12,862: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=53 with params {} docker-swh-lister-1 | [2022-09-21 12:25:13,169: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=54 with params {} docker-swh-lister-1 | [2022-09-21 12:25:13,549: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=55 with params {} docker-swh-lister-1 | [2022-09-21 12:25:13,827: DEBUG/ForkPoolWorker-1] Start from server, version: 0.9, properties: {'capabilities': {'publisher_confirms': True, 'exchange_exchange_bindings': True, 'basic.nack': True, 'consumer_cancel_notify': True, 'connection.blocked': True, 'consumer_priorities': True, 'authentication_failure_close': True, 'per_consumer_qos': True, 'direct_reply_to': True}, 'cluster_name': 'rabbit@0f4428ad1388', 'copyright': 'Copyright (C) 2007-2018 Pivotal Software, Inc.', 'information': 'Licensed under the MPL. See http://www.rabbitmq.com/', 'platform': 'Erlang/OTP 19.2.1', 'product': 'RabbitMQ', 'version': '3.6.16'}, mechanisms: [b'AMQPLAIN', b'PLAIN'], locales: ['en_US'] docker-swh-lister-1 | [2022-09-21 12:25:13,828: DEBUG/ForkPoolWorker-1] using channel_id: 1 docker-swh-lister-1 | [2022-09-21 12:25:13,831: DEBUG/ForkPoolWorker-1] Channel open docker-swh-lister-1 | [2022-09-21 12:25:13,833: INFO/ForkPoolWorker-1] Task swh.lister.gogs.tasks.FullGogsRelister[2b0ba017-4917-44dc-9511-e860bb431322] succeeded in 16.23296177299926s: {'pages': 55, 'origins': 575}
I was wondering why the number of listed repos was so low, turns out gogs has an option to not list a repository through the API when creating it:
In D8511#220866, @jayeshv wrote:In D8511#220865, @vlorentz wrote:Using .lower() on byte strings of unknown encoding is going to have surprising results.
What is the intended use of making this case-insensitive?
The intention is to have some kind of a search for contents. Do you think a case sensitive filter is good enough, or should we explicitly handle the error here?
In D8514#220858, @anlambert wrote:Argh, why cypress tests did not catch this ? thanks !
Argh, why cypress tests did not catch this ? thanks !
Sep 20 2022
Looks good to me, thanks !
Rebase
Rebase
Rebase
Rebase and improve test