remove useless properties
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Sep 21 2020
Remove unecessary comments
I don't know what's the current state of the diff is (I see quite a lot of commented code, don't know if you want to merge it like this or not, e.g. Vagrantfile).
I just discover there is still an issue with the network interfaces not configured after a restart of a vm. IMO we can wait for this problem to be solved before landing the diff.
Fix plenty of " :" ;)
The recurring visits looks good.
update the diff according the previous feedbacks :
- The logstash hosts are declared on a uniq property
- No more yaml templates ;)
- The profile::filbeat doesn't use parameters anymore
- Ensure the permissions are correctly set
- Add the purge option on the inputs.d directory
Sep 18 2020
Add vagrant configuration and documentation
Sep 17 2020
add vagrant usage documentation
prefix the preseed file by the debian version name
rebase
- lister's cache truncated :
swh-lister=> truncate gitea_repo; TRUNCATE TABLE swh-lister=> truncate launchpad_repo; TRUNCATE TABLE
- recurring task for full listing created :
- gitea
swhscheduler@saatchi:~$ swh scheduler --config-file /etc/softwareheritage/scheduler.yml task add list-gitea-full url=https://codeberg.org/api/v1/ limit=100 INFO:swh.core.config:Loading config file /etc/softwareheritage/scheduler.yml Created 1 tasks
- Task types registered on the scheduler :
swhscheduler@saatchi:~$ swh scheduler --config-file /etc/softwareheritage/scheduler.yml task-type register -p lister.launchpad INFO:swh.core.config:Loading config file /etc/softwareheritage/scheduler.yml INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin lister.launchpad swhscheduler@saatchi:~$ swh scheduler --config-file /etc/softwareheritage/scheduler.yml task-type register -p lister.gitea INFO:swh.core.config:Loading config file /etc/softwareheritage/scheduler.yml INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin lister.gitea
the iso is available on the public annex at this url : https://annex.softwareheritage.org/public/isos/virtualbox/debian/
In D3967#98011, @olasd wrote:In D3967#97981, @ardumont wrote:I think this should be moved to the sysadm-provisioning repository [1] instead.
[1] https://forge.softwareheritage.org/source/swh-sysadmin-provisioning/
No, I think a puppet-specific thing like this should definitely be in the puppet repository.
- initial gitea lister launched :
swhworker@worker02:~$ SWH_CONFIG_FILENAME=/etc/softwareheritage/lister.yml swh lister run --lister gitea --priority high url=https://codeberg.org/api/v1/ limit=100 ... INFO:root:listing repos starting at 1198 INFO:root:listing repos starting at 1199 INFO:root:listing repos starting at 1200 INFO:root:stopping after page 1200, no next link found
everything seems to work well, the production deployment will be done in T2608
- initial manual launchpad listing launched :
swhworker@worker02:~$ SWH_CONFIG_FILENAME=/etc/softwareheritage/lister.yml swh lister run --lister launchpad --priority high INFO:swh.core.config:Loading config file /etc/softwareheritage/lister.yml INFO:swh.core.config:Loading config file /etc/softwareheritage/global.ini INFO:swh.core.config:Loading config file /etc/softwareheritage/lister.yml
- schedult tak-types created:
swhscheduler@saatchi:~$ swh scheduler --config-file /etc/softwareheritage/scheduler.yml task-type register -p lister.gitea INFO:swh.core.config:Loading config file /etc/softwareheritage/scheduler.yml INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin lister.gitea INFO:swh.scheduler.cli.task_type:Create task type list-gitea-full in scheduler INFO:swh.scheduler.cli.task_type:Create task type list-gitea-incremental in scheduler swhscheduler@saatchi:~$ swh scheduler --config-file /etc/softwareheritage/scheduler.yml task-type register -p lister.launchpad INFO:swh.core.config:Loading config file /etc/softwareheritage/scheduler.yml INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin lister.launchpad INFO:swh.scheduler.cli.task_type:Create task type list-launchpad-full in scheduler INFO:swh.scheduler.cli.task_type:Create task type list-launchpad-incremental in scheduler INFO:swh.scheduler.cli.task_type:Create task type list-launchpad-new in scheduler
- user guest granted to access the new tables :
swh-lister=> grant select swh-lister-> on all tables in schema public swh-lister-> to guest; GRANT
- lister model updated from worker01:
swhworker@worker01:/etc/softwareheritage$ swh lister --db-url postgresql://*****@db.internal.softwareheritage.org:5432/swh-lister db-init INFO:swh.lister.cli:Loading lister bitbucket INFO:swh.lister.cli:Loading lister cgit INFO:swh.lister.cli:Loading lister cran INFO:swh.lister.cli:Loading lister debian INFO:swh.lister.cli:Loading lister gitea INFO:swh.lister.cli:Loading lister github INFO:swh.lister.cli:Loading lister gitlab INFO:swh.lister.cli:Loading lister gnu INFO:swh.lister.cli:Loading lister launchpad INFO:swh.lister.cli:Loading lister npm INFO:swh.lister.cli:Loading lister packagist INFO:swh.lister.cli:Loading lister phabricator INFO:swh.lister.cli:Loading lister pypi INFO:swh.lister.cli:Initializing database INFO:swh.lister.core.models:Creating tables INFO:swh.lister.cli:Calling init hook for debian
New version of the lister package deployed :
- on workers
root@pergamon:~# clush -b -w @swh-workers 'apt-get update; apt install -y python3-swh.lister' ... root@pergamon:~# clush -b -w @swh-workers "dpkg -l python3-swh.lister" --------------- worker[01-16] (16) --------------- Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-==================-====================-============-================================================================= ii python3-swh.lister 0.1.4-1~swh1~bpo10+1 all Software Heritage Listers (bitbucket, git(lab|hub), pypi, etc...)
- on the scheduler :
root@saatchi:~# apt update && apt install python3-swh.lister ... Restarting services... systemctl restart gunicorn-swh-scheduler.service icinga2.service journalbeat.service postfix@-.service rabbitmq-server.service rpcbind.service ssh.service swh-scheduler-runner.service unbound.service
Actions :
- deploy the new version of the lister on each worker
- update the lister data model
- create the new task-type on the scheduler
- manually launch a listing to create high priority loading tasks for launchpad and gitea repository to ingest soon the repositories and not at the end of the current git queue
- truncate lister cache to allow the recurring loading tasks to be created
- schedule the recurring listing tasks for both repositories
Sep 16 2020
Use the capability of filebeat to use a file per input.
Sep 15 2020
The loading is in progress and seems ok
The loading task for guix is scheduled in production :
swhscheduler@saatchi:~$ swh scheduler --config-file /etc/softwareheritage/scheduler.yml task add load-nixguix url=https://guix.gnu.org/sources.json INFO:swh.core.config:Loading config file /etc/softwareheritage/scheduler.yml Created 1 tasks
The configuration was fixed on moma :
root@moma:/etc/filebeat# diff -U3 /tmp/filebeat.yml /etc/filebeat/filebeat.yml --- /tmp/filebeat.yml 2020-09-15 08:10:20.512838905 +0000 +++ /etc/filebeat/filebeat.yml 2020-09-15 08:16:13.096135043 +0000 @@ -1,4 +1,4 @@ -filebeat.prospectors: +filebeat.inputs: - type: log paths: - /var/log/apache2/archive.softwareheritage.org_non-ssl_access.log
and the logs since the last rotation correctly ingested :
Sep 14 2020
An email was sent on the swh-devel mailing list to ask for reviews.
The deployment in production will be performed in the middle of week 38 is no problems are raised.
An email was sent on the swh-devel mailing list to ask for reviews.
The deployment in production will be performed in the middle of week 38 is no problems are raised.
Sep 11 2020
Sep 10 2020
reopened to validate to complete process from the listing to the loading of some repository
The test of new version v0.1.4 including the fix on the the range split, the uid change and the incremental task fix is ok.
LGTM
Tested in the docker-environment, the problem is not reproduced anymore with 5 concurrent listers.
Sep 9 2020
The concurrency issue was reproduced locally on the docker environment with a concurrency of 5.
I have tested to create a list-gitea-incremental task but it fails to but this time with another exception relative to an unexpected "sort" parameter : https://sentry.softwareheritage.org/share/issue/b0119b56f24347bcb58ac28c68685c62/
the configuration is deployed and the listers were restarted.
For info, on my desktop with the docker environment, with a limit of 100, the lister takes 3s to list the complete codeberg forge :
swh-lister_1 | [2020-09-08 18:33:19,259: INFO/ForkPoolWorker-1] Task swh.lister.gitea.tasks.RangeGiteaLister[363e0b30-b13a-4f62-bd31-9847dfe62450] succeeded in 3.7196799100056523s: {'status': 'eventful'}
The task ran in 30mn (1887s):
Sep 08 13:45:34 worker1 python3[237586]: [2020-09-08 13:45:34,851: INFO/ForkPoolWorker-4] Task swh.lister.launchpad.tasks.FullLaunchpadLister[73e298be-aeda-4882-b52d-dfe5a2ec316c] succeeded in 1887.75128286588s: {'status': 'eventful'}
- The data model does't need to be created because it was already done in T2358
- The task is created :
swhscheduler@scheduler0:~$ swh scheduler --config-file /etc/softwareheritage/scheduler.yml task add --policy oneshot list-gitea-full url=https://codeberg.org/api/v1/ limit=100 WARNING:swh.core.cli:Could not load subcommand storage: No module named 'swh.journal' INFO:swh.core.config:Loading config file /etc/softwareheritage/scheduler.yml Created 1 tasks
Sep 8 2020
- task-type registered :
swhscheduler@scheduler0:/etc/softwareheritage/backend$ swh scheduler --config-file /etc/softwareheritage/scheduler.yml task-type register -p lister.gitea WARNING:swh.core.cli:Could not load subcommand storage: No module named 'swh.journal' INFO:swh.core.config:Loading config file /etc/softwareheritage/scheduler.yml INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin lister.gitea INFO:swh.scheduler.cli.task_type:Create task type list-gitea-full in scheduler INFO:swh.scheduler.cli.task_type:Create task type list-gitea-incremental in scheduler
fix mix with launchpad tasks
The launchpad lister (v0.1.2) is deployed and running on staging
A parameter was missing in the call :
swhscheduler@scheduler0:~$ swh scheduler --config-file /etc/softwareheritage/scheduler.yml task-type list
Octocatalog-diff test :
➜ puppet-environment git:(master) ✗ bin/octocatalog-diff --octocatalog-diff-args --no-truncate-details --to arcpatch_D3884 worker0.internal.staging.swh.network Found host worker0.internal.staging.swh.network WARN -> Environment "arcpatch-D3884" contained non-word characters, correcting name to arcpatch_D3884 Cloning into '/tmp/swh-ocd.e8lQMoZ0/environments/production/data/private'... done. Cloning into '/tmp/swh-ocd.e8lQMoZ0/environments/arcpatch_D3884/data/private'... done. *** Running octocatalog-diff on host worker0.internal.staging.swh.network I, [2020-09-08T10:45:12.720125 #4652] INFO -- : Catalogs compiled for worker0.internal.staging.swh.network I, [2020-09-08T10:45:13.707209 #4652] INFO -- : Diffs computed for worker0.internal.staging.swh.network diff origin/production/worker0.internal.staging.swh.network current/worker0.internal.staging.swh.network ******************************************* File[/etc/softwareheritage/lister.yml] => parameters => content => @@ -24,4 +24,6 @@ - swh.lister.gitlab.tasks.FullGitLabRelister - swh.lister.gnu.tasks.GNUListerTask + - swh.lister.launchpad.tasks.FullLaunchpadLister + - swh.lister.launchpad.tasks.IncrementalLaunchpadLister - swh.lister.npm.tasks.NpmListerTask - swh.lister.phabricator.tasks.FullPhabricatorLister ******************************************* *** End octocatalog-diff on worker0.internal.staging.swh.network
Sep 4 2020
Wikimedia is using netbox as the source of trust in their infrastructure and puppet is configuring the facts from it. It's not exactly the same use case we want as we would like to have netbox automatically provisioned.
and their documentation : https://wikitech.wikimedia.org/wiki/Netbox