v0.21.0 : To fix the conflict issue when duplicated origins are inserted in the same batch
v0.22.0: To fix the datetime overflow relative to the next_visit_queue_position field
Description
Description
| Status | Assigned | Task | ||
|---|---|---|---|---|
| Migrated | gitlab-migration | T3769 NPM lister is failing with a database update conflict | ||
| Migrated | gitlab-migration | T3773 Deploy swh-scheduler v0.22.0 |
Event Timeline
Comment Actions
The problem is reproduced in staging before the deployment
swhworker@worker1:~$ swh lister -C /etc/softwareheritage/lister.yml run -l npm
Traceback (most recent call last):
File "/usr/bin/swh", line 11, in <module>
load_entry_point('swh.core==0.15.0', 'console_scripts', 'swh')()
File "/usr/lib/python3/dist-packages/swh/core/cli/__init__.py", line 185, in main
return swh(auto_envvar_prefix="SWH")
File "/usr/lib/python3/dist-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/usr/lib/python3/dist-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/usr/lib/python3/dist-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/python3/dist-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/python3/dist-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/python3/dist-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/usr/lib/python3/dist-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/lib/python3/dist-packages/swh/lister/cli.py", line 65, in run
get_lister(lister, **config).run()
File "/usr/lib/python3/dist-packages/swh/lister/pattern.py", line 130, in run
full_stats.origins += self.send_origins(origins)
File "/usr/lib/python3/dist-packages/swh/lister/pattern.py", line 234, in send_origins
ret = self.scheduler.record_listed_origins(batch_origins)
File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 181, in meth_
return self.post(meth._endpoint_path, post_data)
File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 278, in post
return self._decode_response(response)
File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 354, in _decode_response
self.raise_for_status(response)
File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 344, in raise_for_status
raise exception from None
swh.core.api.RemoteException: <RemoteException 500 CardinalityViolation: ['ON CONFLICT DO UPDATE command cannot affect row a second time\nHINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.\n']>Comment Actions
Version v0.21.0 deployed in staging:
root@scheduler0:~# apt list --upgradable 2>/dev/null | grep swh | cut -f1 -d'/' | xargs -t apt install apt install python3-swh.core python3-swh.counters python3-swh.journal python3-swh.lister python3-swh.loader.core python3-swh.model python3-swh.objstorage python3-swh.scheduler python3-swh.storage ... root@scheduler0:~# systemctl reload gunicorn-swh-scheduler.service
Loading test is ok, the process was oom killed but it run far longer than previously:
swhworker@worker1:~$ /usr/bin/time swh lister -C /etc/softwareheritage/lister.yml run -l npm Command terminated by signal 9 407.67user 81.62system 3:13:08elapsed 4%CPU (0avgtext+0avgdata 8394852maxresident)k 36216inputs+0outputs (240major+11017567minor)pagefaults 0swaps
Comment Actions
Deployment of the version v0.22.0 in staging
Current database version:
swh-scheduler=> \conninfo
You are connected to database "swh-scheduler" as user "guest" on host "db1.internal.staging.swh.network" (address "192.168.130.11") at port "5432".
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
swh-scheduler=> select * from dbversion order by version desc limit 1;
version | release | description
---------+-------------------------------+------------------
30 | 2021-08-06 09:24:17.902284+00 | Work In Progress
(1 row)- Preparing the migration (stop puppet, swh-scheduler-journal-client.service, swh-scheduler-schedule-recurrent.service)
From irc:
20:12:27 +olasd | vsellier: you'll want swh-scheduler-journal-client.service and swh-scheduler-schedule-recurrent.service stopped for the migration
root@scheduler0:~# puppet agent --disable "T3773"
root@scheduler0:~# systemctl stop swh-scheduler-journal-client
root@scheduler0:~# systemctl stop swh-scheduler-schedule-recurrent
root@scheduler0:~# systemctl status swh-scheduler-journal-client
● swh-scheduler-journal-client.service - Software Heritage Scheduler Journal Client
Loaded: loaded (/etc/systemd/system/swh-scheduler-journal-client.service; enabled; vendor preset: enabled)
Active: inactive (dead) (Result: exit-code) since Wed 2021-12-08 10:19:16 UTC; 15s ago
Process: 3982600 ExecStart=/usr/bin/swh scheduler --config-file /etc/softwareheritage/scheduler/journal-client.yml journal-client (code=exited, status=1/FAILU
Main PID: 3982600 (code=exited, status=1/FAILURE)
CPU: 805ms
Dec 08 10:18:45 scheduler0 systemd[1]: swh-scheduler-journal-client.service: Failed with result 'exit-code'.
Dec 08 10:19:16 scheduler0 systemd[1]: Stopped Software Heritage Scheduler Journal Client.
● swh-scheduler-journal-client.service - Software Heritage Scheduler Journal Client
Loaded: loaded (/etc/systemd/system/swh-scheduler-journal-client.service; enabled; vendor preset: enabled)
Active: inactive (dead) (Result: exit-code) since Wed 2021-12-08 10:19:16 UTC; 15s ago
Process: 3982600 ExecStart=/usr/bin/swh scheduler --config-file /etc/softwareheritage/scheduler/journal-client.yml journal-client (code=exited, status=1/FAILU
Main PID: 3982600 (code=exited, status=1/FAILURE)
CPU: 805ms
Dec 08 10:18:45 scheduler0 systemd[1]: swh-scheduler-journal-client.service: Failed with result 'exit-code'.
Dec 08 10:19:16 scheduler0 systemd[1]: Stopped Software Heritage Scheduler Journal Client.
root@scheduler0:~# systemctl status swh-scheduler-schedule-recurrent.service
● swh-scheduler-schedule-recurrent.service - Software Heritage scheduler for recurrent visits
Loaded: loaded (/etc/systemd/system/swh-scheduler-schedule-recurrent.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Wed 2021-12-08 10:19:24 UTC; 17s ago
Process: 3826024 ExecStart=/usr/bin/swh --log-level INFO scheduler --config-file /etc/softwareheritage/scheduler/listener-runner.yml schedule-recurrent (code=
Main PID: 3826024 (code=exited, status=0/SUCCESS)
CPU: 2min 4.694s
Dec 08 10:07:29 scheduler0 swh[3826024]: INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type git with policy already_visited_order_by_l
Dec 08 10:07:29 scheduler0 swh[3826024]: INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type git with policy never_visited_oldest_updat
Dec 08 10:07:29 scheduler0 swh[3826024]: INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type git with policy origins_without_last_updat
Dec 08 10:07:29 scheduler0 swh[3826024]: INFO:swh.scheduler.celery_backend.recurrent_visits:git: 125 visits scheduled in queue swh.loader.git.tasks.UpdateGitRepos
Dec 08 10:19:24 scheduler0 systemd[1]: Stopping Software Heritage scheduler for recurrent visits...
Dec 08 10:19:24 scheduler0 swh[3826024]: INFO:swh.scheduler.celery_backend.recurrent_visits:Termination requested...
Dec 08 10:19:24 scheduler0 swh[3826024]: INFO:swh.scheduler.celery_backend.recurrent_visits:Terminating visit scheduling threads: archive-files, cran, deb, deb-pa
Dec 08 10:19:24 scheduler0 systemd[1]: swh-scheduler-schedule-recurrent.service: Succeeded.
Dec 08 10:19:24 scheduler0 systemd[1]: Stopped Software Heritage scheduler for recurrent visits.
Dec 08 10:19:24 scheduler0 systemd[1]: swh-scheduler-schedule-recurrent.service: Consumed 2min 4.694s CPU time.- upgrade the packages
root@scheduler0:~# apt list --upgradable 2>/dev/null | grep python3-swh python3-swh.loader.core/unknown 2.0.0-1~swh1~bpo10+1 all [upgradable from: 1.2.1-1~swh1~bpo10+1] python3-swh.scheduler/unknown 0.22.0-1~swh1~bpo10+1 all [upgradable from: 0.21.0-1~swh1~bpo10+1] root@scheduler0:~# apt install python3-swh.scheduler python3-swh.loader.core Reading package lists... Done Building dependency tree Reading state information... Done The following packages will be upgraded: python3-swh.loader.core python3-swh.scheduler 2 upgraded, 0 newly installed, 0 to remove and 7 not upgraded. Need to get 4940 kB of archives. After this operation, 74.8 kB of additional disk space will be used. Get:1 https://debian.softwareheritage.org buster-swh/main amd64 python3-swh.scheduler all 0.22.0-1~swh1~bpo10+1 [85.1 kB] Get:2 https://debian.softwareheritage.org buster-swh/main amd64 python3-swh.loader.core all 2.0.0-1~swh1~bpo10+1 [4855 kB] Fetched 4940 kB in 0s (45.0 MB/s) Reading changelogs... Done (Reading database ... 78138 files and directories currently installed.) Preparing to unpack .../python3-swh.scheduler_0.22.0-1~swh1~bpo10+1_all.deb ... Unpacking python3-swh.scheduler (0.22.0-1~swh1~bpo10+1) over (0.21.0-1~swh1~bpo10+1) ... Preparing to unpack .../python3-swh.loader.core_2.0.0-1~swh1~bpo10+1_all.deb ... Unpacking python3-swh.loader.core (2.0.0-1~swh1~bpo10+1) over (1.2.1-1~swh1~bpo10+1) ... Setting up python3-swh.scheduler (0.22.0-1~swh1~bpo10+1) ... Setting up python3-swh.loader.core (2.0.0-1~swh1~bpo10+1) ... # restart the services not impacted by the database migration root@scheduler0:~# systemctl restart swh-scheduler-runner gunicorn-swh-scheduler.service swh-scheduler-listener.service swh-scheduler-runner-priority.service root@scheduler0:~# systemctl status swh-scheduler-runner gunicorn-swh-scheduler.service swh-scheduler-listener.service swh-scheduler-runner-priority.service <they are all running>
- Database migration to v31
root@db1:~/swh-scheduler/sql/updates# git pull Already up to date. root@db1:~/swh-scheduler/sql/updates# cp 31.sql /tmp root@db1:~/swh-scheduler/sql/updates# sudo -u postgres psql swh-scheduler could not change directory to "/root/swh-scheduler/sql/updates": Permission denied psql (13.4 (Debian 13.4-1.pgdg100+1), server 12.8 (Debian 12.8-1.pgdg100+1)) Type "help" for help. swh-scheduler=# \timing Timing is on. swh-scheduler=# \i /tmp/31.sql INSERT 0 1 Time: 19.725 ms ALTER TABLE Time: 38065.123 ms (00:38.065) COMMENT Time: 65.156 ms ALTER TABLE Time: 95.791 ms
swh-scheduler=# \d origin_visit_stats
Table "public.origin_visit_stats"
Column | Type | Collation | Nullable | Default
---------------------------+--------------------------+-----------+----------+---------
url | text | | not null |
visit_type | text | | not null |
last_snapshot | bytea | | |
last_scheduled | timestamp with time zone | | |
next_visit_queue_position | bigint | | |
next_position_offset | integer | | not null | 4
successive_visits | integer | | not null | 1
last_successful | timestamp with time zone | | |
last_visit | timestamp with time zone | | |
last_visit_status | last_visit_status | | |
Indexes:
"origin_visit_stats_pkey" PRIMARY KEY, btree (url, visit_type)- Restarting the services
root@scheduler0:~# systemctl start swh-scheduler-journal-client
root@scheduler0:~# systemctl status swh-scheduler-journal-client
● swh-scheduler-journal-client.service - Software Heritage Scheduler Journal Client
Loaded: loaded (/etc/systemd/system/swh-scheduler-journal-client.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2021-12-08 10:48:33 UTC; 34s ago
Main PID: 3986437 (swh)
Tasks: 6 (limit: 9537)
Memory: 46.3M
CPU: 4.867s
CGroup: /system.slice/swh-scheduler-journal-client.service
└─3986437 /usr/bin/python3 /usr/bin/swh scheduler --config-file /etc/softwareheritage/scheduler/journal-client.yml journal-client
Dec 08 10:48:33 scheduler0 systemd[1]: Started Software Heritage Scheduler Journal Client.
root@scheduler0:~# systemctl start swh-scheduler-schedule-recurrent.service
root@scheduler0:~# systemctl status swh-scheduler-schedule-recurrent.service
● swh-scheduler-schedule-recurrent.service - Software Heritage scheduler for recurrent visits
Loaded: loaded (/etc/systemd/system/swh-scheduler-schedule-recurrent.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2021-12-08 10:50:13 UTC; 16s ago
Main PID: 3986611 (swh)
Tasks: 15 (limit: 9537)
Memory: 58.0M
CPU: 1.297s
CGroup: /system.slice/swh-scheduler-schedule-recurrent.service
└─3986611 /usr/bin/python3 /usr/bin/swh --log-level INFO scheduler --config-file /etc/softwareheritage/scheduler/listener-runner.yml schedule-recurre
Dec 08 10:50:23 scheduler0 swh[3986611]: INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type svn with policy origins_without_last_updat
Dec 08 10:50:23 scheduler0 swh[3986611]: INFO:swh.scheduler.celery_backend.recurrent_visits:svn: 27 visits scheduled in queue swh.loader.svn.tasks.DumpMountAndLoa
Dec 08 10:50:27 scheduler0 swh[3986611]: INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type git with policy already_visited_order_by_l
Dec 08 10:50:27 scheduler0 swh[3986611]: INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type git with policy never_visited_oldest_updat
Dec 08 10:50:27 scheduler0 swh[3986611]: INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type git with policy origins_without_last_updat
Dec 08 10:50:27 scheduler0 swh[3986611]: INFO:swh.scheduler.celery_backend.recurrent_visits:git: 149 visits scheduled in queue swh.loader.git.tasks.UpdateGitRepos
Dec 08 10:50:29 scheduler0 swh[3986611]: INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type svn with policy already_visited_order_by_l
Dec 08 10:50:29 scheduler0 swh[3986611]: INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type svn with policy never_visited_oldest_updat
Dec 08 10:50:29 scheduler0 swh[3986611]: INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type svn with policy origins_without_last_updat
Dec 08 10:50:30 scheduler0 swh[3986611]: INFO:swh.scheduler.celery_backend.recurrent_visits:svn: 26 visits scheduled in queue swh.loader.svn.tasks.DumpMountAndLoaComment Actions
deployment of version v0.22.0 in production
- database migration preparation
root@belvedere:~/swh-scheduler/sql/updates# git pull
Already up to date.
root@belvedere:~/swh-scheduler/sql/updates# cp 31.sql /tmp
root@belvedere:~/swh-scheduler/sql/updates# sudo -u postgres psql -p 5434 softwareheritage-scheduler
could not change directory to "/root/swh-scheduler/sql/updates": Permission denied
psql (12.8 (Debian 12.8-1.pgdg100+1))
Type "help" for help.
softwareheritage-scheduler=# \conninfo
You are connected to database "softwareheritage-scheduler" as user "postgres" via socket in "/var/run/postgresql" at port "5434".
softwareheritage-scheduler=# select * from dbversion order by version desc limit 1;
version | release | description
---------+-------------------------------+------------------
30 | 2021-08-12 06:25:11.755147+00 | Work In Progress
(1 row)- deployment on saatchi
- Stop the schedulers running on the tmux session
- stop puppet and the impacted services
root@saatchi:~# puppet agent --disable "T3773" root@saatchi:~# systemctl stop swh-scheduler-journal-client root@saatchi:~# systemctl stop swh-scheduler-schedule-recurrent <finished in timeout but effectively stopped>
- check the upgradable packages, there is lot of package to upgrade
root@saatchi:~# apt list --upgradable 2>/dev/null | grep python3-swh python3-swh.core/unknown 1.0.0-2~swh1~bpo10+1 all [upgradable from: 0.15.0-1~swh1~bpo10+1] python3-swh.counters/unknown 0.9.0-1~swh1~bpo10+1 all [upgradable from: 0.8.0-1~swh1~bpo10+1] python3-swh.deposit.client/unknown 0.16.0-1~swh1~bpo10+1 all [upgradable from: 0.14.3-1~swh1~bpo10+1] python3-swh.deposit.loader/unknown 0.16.0-1~swh1~bpo10+1 all [upgradable from: 0.14.3-1~swh1~bpo10+1] python3-swh.journal/unknown 0.9.1-1~swh1~bpo10+1 all [upgradable from: 0.8.0-1~swh1~bpo10+1] python3-swh.lister/unknown 2.6.1-1~swh1~bpo10+1 all [upgradable from: 2.5.0-1~swh1~bpo10+1] python3-swh.loader.core/unknown 2.0.0-1~swh1~bpo10+1 all [upgradable from: 1.2.0-1~swh1~bpo10+1] python3-swh.loader.git/unknown 1.2.0-1~swh1~bpo10+1 all [upgradable from: 1.0.1-1~swh1~bpo10+1] python3-swh.loader.mercurial/unknown 3.0.0-1~swh1~bpo10+1 all [upgradable from: 2.3.1-1~swh1~bpo10+1] python3-swh.loader.svn/unknown 0.10.2-1~swh1~bpo10+1 all [upgradable from: 0.7.1-1~swh1~bpo10+1] python3-swh.model/unknown 3.1.0-1~swh1~bpo10+1 all [upgradable from: 3.0.0-1~swh1~bpo10+1] python3-swh.objstorage/unknown 0.3.0-1~swh2~bpo10+1 all [upgradable from: 0.2.3-1~swh1~bpo10+1] python3-swh.scheduler/unknown 0.22.0-1~swh1~bpo10+1 all [upgradable from: 0.20.0-1~swh1~bpo10+1] python3-swh.storage/unknown 0.40.0-2~swh1~bpo10+1 all [upgradable from: 0.37.0-1~swh2~bpo10+1] python3-swh.vault/unknown 1.3.0-1~swh1~bpo10+1 all [upgradable from: 1.2.0-1~swh1~bpo10+1]
- upgrade the packages
root@saatchi:~# apt list --upgradable 2>/dev/null | grep python3-swh | cut -f1 -d"/" | xargs -t apt install apt install python3-swh.core python3-swh.counters python3-swh.deposit.client python3-swh.deposit.loader python3-swh.journal python3-swh.lister python3-swh.loader.core python3-swh.loader.git python3-swh.loader.mercurial python3-swh.loader.svn python3-swh.model python3-swh.objstorage python3-swh.scheduler python3-swh.storage python3-swh.vault
- restart the services
root@saatchi:~# systemctl restart swh-scheduler-runner gunicorn-swh-scheduler.service swh-scheduler-listener.service swh-scheduler-runner-priority.service root@saatchi:~# systemctl status swh-scheduler-runner gunicorn-swh-scheduler.service swh-scheduler-listener.service swh-scheduler-runner-priority.service <they are all running>
- database migration to v31
softwareheritage-scheduler=# \timing Timing is on. softwareheritage-scheduler=# \i /tmp/31.sql INSERT 0 1 Time: 4.983 ms ALTER TABLE Time: 2346962.104 ms (39:06.962) COMMENT Time: 7.763 ms ALTER TABLE Time: 49.457 ms
- restarting services and puppet
root@saatchi:~# systemctl start swh-scheduler-journal-client
root@saatchi:~# systemctl start swh-scheduler-schedule-recurrent
root@saatchi:~# systemctl status swh-scheduler-journal-client
* swh-scheduler-journal-client.service - Software Heritage Scheduler Journal Client
Loaded: loaded (/etc/systemd/system/swh-scheduler-journal-client.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2021-12-08 13:14:44 UTC; 30s ago
Main PID: 997137 (swh)
Tasks: 8 (limit: 24037)
Memory: 46.7M
CGroup: /system.slice/swh-scheduler-journal-client.service
`-997137 /usr/bin/python3 /usr/bin/swh scheduler --config-file /etc/softwareheritage/scheduler/journal-client.yml journal-client
Dec 08 13:14:44 saatchi systemd[1]: Started Software Heritage Scheduler Journal Client.
root@saatchi:~# systemctl status swh-scheduler-schedule-recurrent
* swh-scheduler-schedule-recurrent.service - Software Heritage scheduler for recurrent visits
Loaded: loaded (/etc/systemd/system/swh-scheduler-schedule-recurrent.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2021-12-08 13:15:04 UTC; 21s ago
Main PID: 997182 (swh)
Tasks: 18 (limit: 24037)
Memory: 71.9M
CGroup: /system.slice/swh-scheduler-schedule-recurrent.service
`-997182 /usr/bin/python3 /usr/bin/swh --log-level INFO scheduler --config-file /etc/softwareheritage/scheduler/listener-runner.yml schedule-recurren
Dec 08 13:15:04 saatchi systemd[1]: Started Software Heritage scheduler for recurrent visits.
root@saatchi:~# puppet agent --enable- schedulers running on a tmux restarted