Also [1] (period 20/01/2020 up to 01/04/2020) might come in handy...
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Dec 10 2020
Dec 9 2020
what happened on 2020-03-01, explaining why archive started growing much faster (related to github listing/loading)
Dec 1 2020
Nov 25 2020
Nov 18 2020
Nov 3 2020
Oct 23 2020
Oct 21 2020
Sep 30 2020
Listed (oneshot full + recurring incremental) and loaded (as far as I can tell).
Sep 29 2020
I've sent an email to the fsfe.
Sep 28 2020
The lister is deployed, this forge is not listed though (codeberg.org is).
Can this be closed now? What's missing? Adding a listing task?
Sep 22 2020
Sep 21 2020
Sep 18 2020
Sep 17 2020
Sep 15 2020
Sep 14 2020
An email was sent on the swh-devel mailing list to ask for reviews.
The deployment in production will be performed in the middle of week 38 is no problems are raised.
Sep 10 2020
Sep 9 2020
The task ran in 30mn (1887s):
Sep 08 13:45:34 worker1 python3[237586]: [2020-09-08 13:45:34,851: INFO/ForkPoolWorker-4] Task swh.lister.launchpad.tasks.FullLaunchpadLister[73e298be-aeda-4882-b52d-dfe5a2ec316c] succeeded in 1887.75128286588s: {'status': 'eventful'}
Sep 8 2020
The launchpad lister (v0.1.2) is deployed and running on staging
Sep 4 2020
Thanks for the heads up.
FTR, I've run the launchpad lister in a docker and it executed fine, with fine being "it created 19340 load-git tasks"
Aug 27 2020
I guess this also depends on a packagist loader, which we do not have at all for now...
Aug 26 2020
Also beware that the default pagination value in the gitea lister is 3 (https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/gitea/lister.py$23) so it is very slow.
Ok I was expecting something a bit smart in explore.sapk.fr, but not really:
now we have the gitea lister, we should (upgrade swh.lister on prod and) add a few listing tasks, like this fsfe instance, as well as other instances like https://codeberg.org.
Aug 24 2020
Aug 19 2020
Aug 8 2020
fwiw, the nix sources benefit from this as well
For one, the extensions to skip were not finely analyzed (from the top of my head, we could add ".el' for example).
Seems to have reduced the cost (from ~4500s to ~1500s) but there might still be margin for improvments [1]
For one, the extensions to skip were not finely analyzed (from the top of my head, we could add ".el' extensions to filter out for example).
Aug 7 2020
loader-core 0.9.0 which includes T2510 improvment got deployed on staging to see if that improves time/performance.
(both run for guix and nix sources)
Jul 9 2020
Note: status uneventful with a different snapshot is kinda unexpected for me. Not something drastically problematic though. I'll dig in at some point.
Note: status uneventful with a different snapshot is kinda unexpected for me. Not something drastically problematic though. I'll dig in at some point.
@ardumont: did you load the same sources.json? Because http://guix.gnu.org/sources.json is refreshed every X hours and some stats of the commits after 2018-12-05 (v0.16.0) says mean at 21 and median at 13, both per day. And since loading requires ~1h15min, you need some luck to read the same son file twice.
@ardumont , https://archive.softwareheritage.org/api/1/snapshot/869153d018394df0b75789134d87992eb2353bd4/ says this particular snapshot could not be found. Am I missing something?
Second run btw (forgot to hit enter a while back):
Jul 07 12:10:49 worker2 python3[475116]: [2020-07-07 12:10:49,714: INFO/ForkPoolWorker-1] Task swh.loader.package.nixguix.tasks.LoadNixguix[082dd536-6294-421a-881e-e0bf28e94e0b] succeeded in 4497.450984489056s: {'status': 'uneventful', 'snapshot_id': 'ae96e93d0e24fb4ec484d56109c669da0b267908'}
This is great news, thank you! :-)
Jul 7 2020
Run completed.
Jul 6 2020
Patched staging nixguix loader worker with the diffs above on staging and triggered back a run.
It seems to no longer complain.
Next issue [2]
First issue, missing a top-level "sources" entry [1]
Jun 17 2020
In T1352#45587, @lewo wrote:are you suggesting that sources.json itself be an "origin"?
The sources.json URL is an "origin". Each snapshot associated to this origin has several branches. Each branch corresponds to a source of the sources.json file.
There is also special branch named evaluation which points to the commit specified by the attribute revision of your sources.json file: this is to link a snapshot to a nixpkgs/guix commit.
@lewo it's used in our DB but also exposed in the swh-web UI in search results (and in the future it is going to be also be a field for user searches, so that you can search, e.g., "emacs" only in the list of packages archived from a given origin type).
We need a name for this origin type, one of the hardest problem in CS :-)
Where is it used? Is it a new attribute?
We actually had to choose a name for the visit type, and with a lot of inspiration, we choose nixguix :-/
Do you mean filter the unsupported urls for the field "urls" in the "type": "url"?
Or do you mean only export "type": "url" and remove all the other types from 'sources.json', for instance "git"?
In T1352#45536, @zack wrote:In T1352#45459, @lewo wrote:So, we can now consider the sources.json file format as stable and you could make the required changes on your sources.json file. A new SHW origin should then be added.
We need a name for this origin type, one of the hardest problem in CS :-)
Can you suggest something that makes sense for both Nix, Guix, and other players in the field? As an outsider I'm a bit at loss at proposing something…
Thank you for the notification. I have tried to answer by email but I could have failed. Anyway.
Jun 16 2020
Repology.org went with "Gnu Guix".
In T1352#45459, @lewo wrote:So, we can now consider the sources.json file format as stable and you could make the required changes on your sources.json file. A new SHW origin should then be added.
Jun 15 2020
What do you think @ardumont ?
The nixguix loader is working well since 2 weeks on the nixpkgs sources.json file!
So, we can now consider the sources.json file format as stable and you could make the required changes on your sources.json file. A new SHW origin should then be added.
Jun 9 2020
This task describes in detail what kind of scheduling policy we should implement, but it doesn't help much figure out what the next steps should be.
May 27 2020
I've add multiple looks to the proposed gitea lister.
This looks fine to me, i've accepted it but not completely.
If some other team member could do a second pass, that'd be neat.
May 26 2020
As a rapid follow up, here is the current structure of the sources.json the
loader nixguix is able to ingest. It's not that much different than what @lewo
initially proposed in the lister diff.