Build is green
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Sep 10 2020
Sep 9 2020
The concurrency issue was reproduced locally on the docker environment with a concurrency of 5.
I have tested to create a list-gitea-incremental task but it fails to but this time with another exception relative to an unexpected "sort" parameter : https://sentry.softwareheritage.org/share/issue/b0119b56f24347bcb58ac28c68685c62/
the configuration is deployed and the listers were restarted.
For info, on my desktop with the docker environment, with a limit of 100, the lister takes 3s to list the complete codeberg forge :
swh-lister_1 | [2020-09-08 18:33:19,259: INFO/ForkPoolWorker-1] Task swh.lister.gitea.tasks.RangeGiteaLister[363e0b30-b13a-4f62-bd31-9847dfe62450] succeeded in 3.7196799100056523s: {'status': 'eventful'}
The task ran in 30mn (1887s):
Sep 08 13:45:34 worker1 python3[237586]: [2020-09-08 13:45:34,851: INFO/ForkPoolWorker-4] Task swh.lister.launchpad.tasks.FullLaunchpadLister[73e298be-aeda-4882-b52d-dfe5a2ec316c] succeeded in 1887.75128286588s: {'status': 'eventful'}
- The data model does't need to be created because it was already done in T2358
- The task is created :
swhscheduler@scheduler0:~$ swh scheduler --config-file /etc/softwareheritage/scheduler.yml task add --policy oneshot list-gitea-full url=https://codeberg.org/api/v1/ limit=100 WARNING:swh.core.cli:Could not load subcommand storage: No module named 'swh.journal' INFO:swh.core.config:Loading config file /etc/softwareheritage/scheduler.yml Created 1 tasks
Sep 8 2020
- task-type registered :
swhscheduler@scheduler0:/etc/softwareheritage/backend$ swh scheduler --config-file /etc/softwareheritage/scheduler.yml task-type register -p lister.gitea WARNING:swh.core.cli:Could not load subcommand storage: No module named 'swh.journal' INFO:swh.core.config:Loading config file /etc/softwareheritage/scheduler.yml INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin lister.gitea INFO:swh.scheduler.cli.task_type:Create task type list-gitea-full in scheduler INFO:swh.scheduler.cli.task_type:Create task type list-gitea-incremental in scheduler
The launchpad lister (v0.1.2) is deployed and running on staging
Sep 4 2020
Thanks for the heads up.
FTR, I've run the launchpad lister in a docker and it executed fine, with fine being "it created 19340 load-git tasks"
Aug 27 2020
I guess this also depends on a packagist loader, which we do not have at all for now...
Aug 26 2020
Also beware that the default pagination value in the gitea lister is 3 (https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/gitea/lister.py$23) so it is very slow.
Ok I was expecting something a bit smart in explore.sapk.fr, but not really:
now we have the gitea lister, we should (upgrade swh.lister on prod and) add a few listing tasks, like this fsfe instance, as well as other instances like https://codeberg.org.
Aug 24 2020
Aug 19 2020
Jun 16 2020
May 27 2020
I've add multiple looks to the proposed gitea lister.
This looks fine to me, i've accepted it but not completely.
If some other team member could do a second pass, that'd be neat.
May 18 2020
May 14 2020
What's the status of this patch series? Would be great to deploy it. :-)
May 12 2020
In D2025#67709, @lewo wrote:
Apr 20 2020
Apr 13 2020
Apr 2 2020
Mar 24 2020
Mar 19 2020
Mar 18 2020
First runs triggered on staging, errors will be reported in sentry [1]
Mar 13 2020
Unfortunately, try.gogs.io's API is hidden behind auth so I can't confirm that the responses actually have the same shape between gogs and gitea.
Thanks for submitting this request. There's a good chance that this can be the same lister as gogs: T1721.
Mar 12 2020
Yep, I'm not annoyed, just being emphatic about what we want to see. :-)
Hi Colin (@cjwatson), nice to meet you here !
Mar 11 2020
I'm one of the developers on the Launchpad team. A user identified as "leni" spoke to us about this on IRC last week; it so happened that the Launchpad team were in the middle of an in-person sprint at the time, so we were able to discuss the problem fairly quickly and put together a plan to improve our API. I implemented those improvements shortly afterwards. They aren't quite deployed on production yet, but they should be very soon. Unfortunately I don't have any contact details for leni unless they happen to join IRC, so I'm posting a summary of the discussion and my improvements here, which is probably a useful thing to do anyway.
Mar 10 2020
Mar 9 2020
Thanks to @zimoun, https://guix.gnu.org/sources.json is now generated periodically (every hour). Each url is now a list.
In D2025#65063, @lewo wrote:While looking into this with @zimoun, we realized it would be nicer if url were an array of URLs (as is the case at https://guix.gnu.org/packages.json) rather than a single URL.
without changing now the crawler, i.e., the crawler can ingest only the first elem of the array and it will be modified later.
Yeah, I was thinking to introduce this later. But as you said, we could still modify the format without supporting it in the lister.
So, that's fine for me if we generate a list of urls instead of a single url. I could easily update the file NixOS is generating.
Mar 6 2020
@lewo : Let me know the new diff number. :-)
After some discussions with the SWH team, it is actually no longer the good way to fill the archive with our sources. Instead, I'm starting to write a loader which will be in charge of reading our sources.json and fill the archive. So, I'm closing this diff and will create a new diff with a loader in the next few days;)
Mar 5 2020
Implementation of a lister started at D2025.
After some discussions with the SWH team, it is actually no longer the good way to fill the archive with our sources. Instead, I'm starting to write a loader which will be in charge of reading our sources.json and fill the archive. So, I'm closing this diff and will create a new diff with a loader in the next few days;)
There are also some advantages of implementing a loader: for instance, we could query the SWH API to know which sources of a specific sources.json file have been archived!
Mar 2 2020
@lewo: Does the version of the format should be bumped to 2 with this string-to-array modification?
No, I don't think so since it is not used yet.
> @lewo: Does the version of the format should be bumped to 2 with this string-to-array modification?
@lewo: Does the version of the format should be bumped to 2 with this string-to-array modification?
Feb 27 2020
While looking into this with @zimoun, we realized it would be nicer if url were an array of URLs (as is the case at https://guix.gnu.org/packages.json) rather than a single URL.
without changing now the crawler, i.e., the crawler can ingest only the first elem of the array and it will be modified later.
Feb 18 2020
While looking into this with @zimoun, we realized it would be nicer if url were an array of URLs (as is the case at https://guix.gnu.org/packages.json) rather than a single URL.
The reason is that in many cases, both Guix and Nixpkgs provide a list of URLs rather than a single URL, which is useful when one of them breaks.
In D2025#61932, @civodul wrote:In D2025#61931, @lewo wrote:A CI job is building a sources.json every day! The file is available at https://nix-community.github.io/nixpkgs-swh/sources.json ;)
Awesome!
Feb 14 2020
Feb 13 2020
What is the current status of this task ?
Jan 30 2020
> If you are going to the FOSDEM, would be nice to meet you there to talk about next steps! It would but i'm not going.
> If you are going to the FOSDEM, would be nice to meet you there to talk about next steps! I'm already in Brussels and would be happy to meet!
A CI job is building a sources.json every day! The file is available at https://nix-community.github.io/nixpkgs-swh/sources.json ;)
This is a community CI (not hosted on main NixOS infrastructure) which will allow me to iterate quickly on this file.
In D2025#61931, @lewo wrote:A CI job is building a sources.json every day! The file is available at https://nix-community.github.io/nixpkgs-swh/sources.json ;)
Jan 29 2020
A CI job is building a sources.json every day! The file is available at https://nix-community.github.io/nixpkgs-swh/sources.json ;)
This is a community CI (not hosted on main NixOS infrascture) which will allow me to iterate quickly on this file.
Jan 27 2020
Jan 22 2020
Jan 21 2020
This should take care of [1]
Jan 17 2020
Could you please update the title and the description according to the current state?
(i you don't have time, please tell me so i will ;)
Jan 16 2020
Build is green
See https://jenkins.softwareheritage.org/job/DLS/job/tox/534/ for more details.