v0.21.0 : To fix the conflict issue when duplicated origins are inserted in the same batch
v0.22.0: To fix the datetime overflow relative to the next_visit_queue_position field
Description
Description
Status | Assigned | Task | ||
---|---|---|---|---|
Migrated | gitlab-migration | T3769 NPM lister is failing with a database update conflict | ||
Migrated | gitlab-migration | T3773 Deploy swh-scheduler v0.22.0 |
Event Timeline
Comment Actions
The problem is reproduced in staging before the deployment
swhworker@worker1:~$ swh lister -C /etc/softwareheritage/lister.yml run -l npm Traceback (most recent call last): File "/usr/bin/swh", line 11, in <module> load_entry_point('swh.core==0.15.0', 'console_scripts', 'swh')() File "/usr/lib/python3/dist-packages/swh/core/cli/__init__.py", line 185, in main return swh(auto_envvar_prefix="SWH") File "/usr/lib/python3/dist-packages/click/core.py", line 764, in __call__ return self.main(*args, **kwargs) File "/usr/lib/python3/dist-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/usr/lib/python3/dist-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/lib/python3/dist-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/lib/python3/dist-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/lib/python3/dist-packages/click/core.py", line 555, in invoke return callback(*args, **kwargs) File "/usr/lib/python3/dist-packages/click/decorators.py", line 17, in new_func return f(get_current_context(), *args, **kwargs) File "/usr/lib/python3/dist-packages/swh/lister/cli.py", line 65, in run get_lister(lister, **config).run() File "/usr/lib/python3/dist-packages/swh/lister/pattern.py", line 130, in run full_stats.origins += self.send_origins(origins) File "/usr/lib/python3/dist-packages/swh/lister/pattern.py", line 234, in send_origins ret = self.scheduler.record_listed_origins(batch_origins) File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 181, in meth_ return self.post(meth._endpoint_path, post_data) File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 278, in post return self._decode_response(response) File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 354, in _decode_response self.raise_for_status(response) File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 344, in raise_for_status raise exception from None swh.core.api.RemoteException: <RemoteException 500 CardinalityViolation: ['ON CONFLICT DO UPDATE command cannot affect row a second time\nHINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.\n']>
Comment Actions
Version v0.21.0 deployed in staging:
root@scheduler0:~# apt list --upgradable 2>/dev/null | grep swh | cut -f1 -d'/' | xargs -t apt install apt install python3-swh.core python3-swh.counters python3-swh.journal python3-swh.lister python3-swh.loader.core python3-swh.model python3-swh.objstorage python3-swh.scheduler python3-swh.storage ... root@scheduler0:~# systemctl reload gunicorn-swh-scheduler.service
Loading test is ok, the process was oom killed but it run far longer than previously:
swhworker@worker1:~$ /usr/bin/time swh lister -C /etc/softwareheritage/lister.yml run -l npm Command terminated by signal 9 407.67user 81.62system 3:13:08elapsed 4%CPU (0avgtext+0avgdata 8394852maxresident)k 36216inputs+0outputs (240major+11017567minor)pagefaults 0swaps
Comment Actions
Deployment of the version v0.22.0 in staging
Current database version:
swh-scheduler=> \conninfo You are connected to database "swh-scheduler" as user "guest" on host "db1.internal.staging.swh.network" (address "192.168.130.11") at port "5432". SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off) swh-scheduler=> select * from dbversion order by version desc limit 1; version | release | description ---------+-------------------------------+------------------ 30 | 2021-08-06 09:24:17.902284+00 | Work In Progress (1 row)
- Preparing the migration (stop puppet, swh-scheduler-journal-client.service, swh-scheduler-schedule-recurrent.service)
From irc:
20:12:27 +olasd | vsellier: you'll want swh-scheduler-journal-client.service and swh-scheduler-schedule-recurrent.service stopped for the migration
root@scheduler0:~# puppet agent --disable "T3773" root@scheduler0:~# systemctl stop swh-scheduler-journal-client root@scheduler0:~# systemctl stop swh-scheduler-schedule-recurrent root@scheduler0:~# systemctl status swh-scheduler-journal-client ● swh-scheduler-journal-client.service - Software Heritage Scheduler Journal Client Loaded: loaded (/etc/systemd/system/swh-scheduler-journal-client.service; enabled; vendor preset: enabled) Active: inactive (dead) (Result: exit-code) since Wed 2021-12-08 10:19:16 UTC; 15s ago Process: 3982600 ExecStart=/usr/bin/swh scheduler --config-file /etc/softwareheritage/scheduler/journal-client.yml journal-client (code=exited, status=1/FAILU Main PID: 3982600 (code=exited, status=1/FAILURE) CPU: 805ms Dec 08 10:18:45 scheduler0 systemd[1]: swh-scheduler-journal-client.service: Failed with result 'exit-code'. Dec 08 10:19:16 scheduler0 systemd[1]: Stopped Software Heritage Scheduler Journal Client. ● swh-scheduler-journal-client.service - Software Heritage Scheduler Journal Client Loaded: loaded (/etc/systemd/system/swh-scheduler-journal-client.service; enabled; vendor preset: enabled) Active: inactive (dead) (Result: exit-code) since Wed 2021-12-08 10:19:16 UTC; 15s ago Process: 3982600 ExecStart=/usr/bin/swh scheduler --config-file /etc/softwareheritage/scheduler/journal-client.yml journal-client (code=exited, status=1/FAILU Main PID: 3982600 (code=exited, status=1/FAILURE) CPU: 805ms Dec 08 10:18:45 scheduler0 systemd[1]: swh-scheduler-journal-client.service: Failed with result 'exit-code'. Dec 08 10:19:16 scheduler0 systemd[1]: Stopped Software Heritage Scheduler Journal Client. root@scheduler0:~# systemctl status swh-scheduler-schedule-recurrent.service ● swh-scheduler-schedule-recurrent.service - Software Heritage scheduler for recurrent visits Loaded: loaded (/etc/systemd/system/swh-scheduler-schedule-recurrent.service; enabled; vendor preset: enabled) Active: inactive (dead) since Wed 2021-12-08 10:19:24 UTC; 17s ago Process: 3826024 ExecStart=/usr/bin/swh --log-level INFO scheduler --config-file /etc/softwareheritage/scheduler/listener-runner.yml schedule-recurrent (code= Main PID: 3826024 (code=exited, status=0/SUCCESS) CPU: 2min 4.694s Dec 08 10:07:29 scheduler0 swh[3826024]: INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type git with policy already_visited_order_by_l Dec 08 10:07:29 scheduler0 swh[3826024]: INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type git with policy never_visited_oldest_updat Dec 08 10:07:29 scheduler0 swh[3826024]: INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type git with policy origins_without_last_updat Dec 08 10:07:29 scheduler0 swh[3826024]: INFO:swh.scheduler.celery_backend.recurrent_visits:git: 125 visits scheduled in queue swh.loader.git.tasks.UpdateGitRepos Dec 08 10:19:24 scheduler0 systemd[1]: Stopping Software Heritage scheduler for recurrent visits... Dec 08 10:19:24 scheduler0 swh[3826024]: INFO:swh.scheduler.celery_backend.recurrent_visits:Termination requested... Dec 08 10:19:24 scheduler0 swh[3826024]: INFO:swh.scheduler.celery_backend.recurrent_visits:Terminating visit scheduling threads: archive-files, cran, deb, deb-pa Dec 08 10:19:24 scheduler0 systemd[1]: swh-scheduler-schedule-recurrent.service: Succeeded. Dec 08 10:19:24 scheduler0 systemd[1]: Stopped Software Heritage scheduler for recurrent visits. Dec 08 10:19:24 scheduler0 systemd[1]: swh-scheduler-schedule-recurrent.service: Consumed 2min 4.694s CPU time.
- upgrade the packages
root@scheduler0:~# apt list --upgradable 2>/dev/null | grep python3-swh python3-swh.loader.core/unknown 2.0.0-1~swh1~bpo10+1 all [upgradable from: 1.2.1-1~swh1~bpo10+1] python3-swh.scheduler/unknown 0.22.0-1~swh1~bpo10+1 all [upgradable from: 0.21.0-1~swh1~bpo10+1] root@scheduler0:~# apt install python3-swh.scheduler python3-swh.loader.core Reading package lists... Done Building dependency tree Reading state information... Done The following packages will be upgraded: python3-swh.loader.core python3-swh.scheduler 2 upgraded, 0 newly installed, 0 to remove and 7 not upgraded. Need to get 4940 kB of archives. After this operation, 74.8 kB of additional disk space will be used. Get:1 https://debian.softwareheritage.org buster-swh/main amd64 python3-swh.scheduler all 0.22.0-1~swh1~bpo10+1 [85.1 kB] Get:2 https://debian.softwareheritage.org buster-swh/main amd64 python3-swh.loader.core all 2.0.0-1~swh1~bpo10+1 [4855 kB] Fetched 4940 kB in 0s (45.0 MB/s) Reading changelogs... Done (Reading database ... 78138 files and directories currently installed.) Preparing to unpack .../python3-swh.scheduler_0.22.0-1~swh1~bpo10+1_all.deb ... Unpacking python3-swh.scheduler (0.22.0-1~swh1~bpo10+1) over (0.21.0-1~swh1~bpo10+1) ... Preparing to unpack .../python3-swh.loader.core_2.0.0-1~swh1~bpo10+1_all.deb ... Unpacking python3-swh.loader.core (2.0.0-1~swh1~bpo10+1) over (1.2.1-1~swh1~bpo10+1) ... Setting up python3-swh.scheduler (0.22.0-1~swh1~bpo10+1) ... Setting up python3-swh.loader.core (2.0.0-1~swh1~bpo10+1) ... # restart the services not impacted by the database migration root@scheduler0:~# systemctl restart swh-scheduler-runner gunicorn-swh-scheduler.service swh-scheduler-listener.service swh-scheduler-runner-priority.service root@scheduler0:~# systemctl status swh-scheduler-runner gunicorn-swh-scheduler.service swh-scheduler-listener.service swh-scheduler-runner-priority.service <they are all running>
- Database migration to v31
root@db1:~/swh-scheduler/sql/updates# git pull Already up to date. root@db1:~/swh-scheduler/sql/updates# cp 31.sql /tmp root@db1:~/swh-scheduler/sql/updates# sudo -u postgres psql swh-scheduler could not change directory to "/root/swh-scheduler/sql/updates": Permission denied psql (13.4 (Debian 13.4-1.pgdg100+1), server 12.8 (Debian 12.8-1.pgdg100+1)) Type "help" for help. swh-scheduler=# \timing Timing is on. swh-scheduler=# \i /tmp/31.sql INSERT 0 1 Time: 19.725 ms ALTER TABLE Time: 38065.123 ms (00:38.065) COMMENT Time: 65.156 ms ALTER TABLE Time: 95.791 ms
swh-scheduler=# \d origin_visit_stats Table "public.origin_visit_stats" Column | Type | Collation | Nullable | Default ---------------------------+--------------------------+-----------+----------+--------- url | text | | not null | visit_type | text | | not null | last_snapshot | bytea | | | last_scheduled | timestamp with time zone | | | next_visit_queue_position | bigint | | | next_position_offset | integer | | not null | 4 successive_visits | integer | | not null | 1 last_successful | timestamp with time zone | | | last_visit | timestamp with time zone | | | last_visit_status | last_visit_status | | | Indexes: "origin_visit_stats_pkey" PRIMARY KEY, btree (url, visit_type)
- Restarting the services
root@scheduler0:~# systemctl start swh-scheduler-journal-client root@scheduler0:~# systemctl status swh-scheduler-journal-client ● swh-scheduler-journal-client.service - Software Heritage Scheduler Journal Client Loaded: loaded (/etc/systemd/system/swh-scheduler-journal-client.service; enabled; vendor preset: enabled) Active: active (running) since Wed 2021-12-08 10:48:33 UTC; 34s ago Main PID: 3986437 (swh) Tasks: 6 (limit: 9537) Memory: 46.3M CPU: 4.867s CGroup: /system.slice/swh-scheduler-journal-client.service └─3986437 /usr/bin/python3 /usr/bin/swh scheduler --config-file /etc/softwareheritage/scheduler/journal-client.yml journal-client Dec 08 10:48:33 scheduler0 systemd[1]: Started Software Heritage Scheduler Journal Client. root@scheduler0:~# systemctl start swh-scheduler-schedule-recurrent.service root@scheduler0:~# systemctl status swh-scheduler-schedule-recurrent.service ● swh-scheduler-schedule-recurrent.service - Software Heritage scheduler for recurrent visits Loaded: loaded (/etc/systemd/system/swh-scheduler-schedule-recurrent.service; enabled; vendor preset: enabled) Active: active (running) since Wed 2021-12-08 10:50:13 UTC; 16s ago Main PID: 3986611 (swh) Tasks: 15 (limit: 9537) Memory: 58.0M CPU: 1.297s CGroup: /system.slice/swh-scheduler-schedule-recurrent.service └─3986611 /usr/bin/python3 /usr/bin/swh --log-level INFO scheduler --config-file /etc/softwareheritage/scheduler/listener-runner.yml schedule-recurre Dec 08 10:50:23 scheduler0 swh[3986611]: INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type svn with policy origins_without_last_updat Dec 08 10:50:23 scheduler0 swh[3986611]: INFO:swh.scheduler.celery_backend.recurrent_visits:svn: 27 visits scheduled in queue swh.loader.svn.tasks.DumpMountAndLoa Dec 08 10:50:27 scheduler0 swh[3986611]: INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type git with policy already_visited_order_by_l Dec 08 10:50:27 scheduler0 swh[3986611]: INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type git with policy never_visited_oldest_updat Dec 08 10:50:27 scheduler0 swh[3986611]: INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type git with policy origins_without_last_updat Dec 08 10:50:27 scheduler0 swh[3986611]: INFO:swh.scheduler.celery_backend.recurrent_visits:git: 149 visits scheduled in queue swh.loader.git.tasks.UpdateGitRepos Dec 08 10:50:29 scheduler0 swh[3986611]: INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type svn with policy already_visited_order_by_l Dec 08 10:50:29 scheduler0 swh[3986611]: INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type svn with policy never_visited_oldest_updat Dec 08 10:50:29 scheduler0 swh[3986611]: INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type svn with policy origins_without_last_updat Dec 08 10:50:30 scheduler0 swh[3986611]: INFO:swh.scheduler.celery_backend.recurrent_visits:svn: 26 visits scheduled in queue swh.loader.svn.tasks.DumpMountAndLoa
Comment Actions
deployment of version v0.22.0 in production
- database migration preparation
root@belvedere:~/swh-scheduler/sql/updates# git pull Already up to date. root@belvedere:~/swh-scheduler/sql/updates# cp 31.sql /tmp root@belvedere:~/swh-scheduler/sql/updates# sudo -u postgres psql -p 5434 softwareheritage-scheduler could not change directory to "/root/swh-scheduler/sql/updates": Permission denied psql (12.8 (Debian 12.8-1.pgdg100+1)) Type "help" for help. softwareheritage-scheduler=# \conninfo You are connected to database "softwareheritage-scheduler" as user "postgres" via socket in "/var/run/postgresql" at port "5434". softwareheritage-scheduler=# select * from dbversion order by version desc limit 1; version | release | description ---------+-------------------------------+------------------ 30 | 2021-08-12 06:25:11.755147+00 | Work In Progress (1 row)
- deployment on saatchi
- Stop the schedulers running on the tmux session
- stop puppet and the impacted services
root@saatchi:~# puppet agent --disable "T3773" root@saatchi:~# systemctl stop swh-scheduler-journal-client root@saatchi:~# systemctl stop swh-scheduler-schedule-recurrent <finished in timeout but effectively stopped>
- check the upgradable packages, there is lot of package to upgrade
root@saatchi:~# apt list --upgradable 2>/dev/null | grep python3-swh python3-swh.core/unknown 1.0.0-2~swh1~bpo10+1 all [upgradable from: 0.15.0-1~swh1~bpo10+1] python3-swh.counters/unknown 0.9.0-1~swh1~bpo10+1 all [upgradable from: 0.8.0-1~swh1~bpo10+1] python3-swh.deposit.client/unknown 0.16.0-1~swh1~bpo10+1 all [upgradable from: 0.14.3-1~swh1~bpo10+1] python3-swh.deposit.loader/unknown 0.16.0-1~swh1~bpo10+1 all [upgradable from: 0.14.3-1~swh1~bpo10+1] python3-swh.journal/unknown 0.9.1-1~swh1~bpo10+1 all [upgradable from: 0.8.0-1~swh1~bpo10+1] python3-swh.lister/unknown 2.6.1-1~swh1~bpo10+1 all [upgradable from: 2.5.0-1~swh1~bpo10+1] python3-swh.loader.core/unknown 2.0.0-1~swh1~bpo10+1 all [upgradable from: 1.2.0-1~swh1~bpo10+1] python3-swh.loader.git/unknown 1.2.0-1~swh1~bpo10+1 all [upgradable from: 1.0.1-1~swh1~bpo10+1] python3-swh.loader.mercurial/unknown 3.0.0-1~swh1~bpo10+1 all [upgradable from: 2.3.1-1~swh1~bpo10+1] python3-swh.loader.svn/unknown 0.10.2-1~swh1~bpo10+1 all [upgradable from: 0.7.1-1~swh1~bpo10+1] python3-swh.model/unknown 3.1.0-1~swh1~bpo10+1 all [upgradable from: 3.0.0-1~swh1~bpo10+1] python3-swh.objstorage/unknown 0.3.0-1~swh2~bpo10+1 all [upgradable from: 0.2.3-1~swh1~bpo10+1] python3-swh.scheduler/unknown 0.22.0-1~swh1~bpo10+1 all [upgradable from: 0.20.0-1~swh1~bpo10+1] python3-swh.storage/unknown 0.40.0-2~swh1~bpo10+1 all [upgradable from: 0.37.0-1~swh2~bpo10+1] python3-swh.vault/unknown 1.3.0-1~swh1~bpo10+1 all [upgradable from: 1.2.0-1~swh1~bpo10+1]
- upgrade the packages
root@saatchi:~# apt list --upgradable 2>/dev/null | grep python3-swh | cut -f1 -d"/" | xargs -t apt install apt install python3-swh.core python3-swh.counters python3-swh.deposit.client python3-swh.deposit.loader python3-swh.journal python3-swh.lister python3-swh.loader.core python3-swh.loader.git python3-swh.loader.mercurial python3-swh.loader.svn python3-swh.model python3-swh.objstorage python3-swh.scheduler python3-swh.storage python3-swh.vault
- restart the services
root@saatchi:~# systemctl restart swh-scheduler-runner gunicorn-swh-scheduler.service swh-scheduler-listener.service swh-scheduler-runner-priority.service root@saatchi:~# systemctl status swh-scheduler-runner gunicorn-swh-scheduler.service swh-scheduler-listener.service swh-scheduler-runner-priority.service <they are all running>
- database migration to v31
softwareheritage-scheduler=# \timing Timing is on. softwareheritage-scheduler=# \i /tmp/31.sql INSERT 0 1 Time: 4.983 ms ALTER TABLE Time: 2346962.104 ms (39:06.962) COMMENT Time: 7.763 ms ALTER TABLE Time: 49.457 ms
- restarting services and puppet
root@saatchi:~# systemctl start swh-scheduler-journal-client root@saatchi:~# systemctl start swh-scheduler-schedule-recurrent root@saatchi:~# systemctl status swh-scheduler-journal-client * swh-scheduler-journal-client.service - Software Heritage Scheduler Journal Client Loaded: loaded (/etc/systemd/system/swh-scheduler-journal-client.service; enabled; vendor preset: enabled) Active: active (running) since Wed 2021-12-08 13:14:44 UTC; 30s ago Main PID: 997137 (swh) Tasks: 8 (limit: 24037) Memory: 46.7M CGroup: /system.slice/swh-scheduler-journal-client.service `-997137 /usr/bin/python3 /usr/bin/swh scheduler --config-file /etc/softwareheritage/scheduler/journal-client.yml journal-client Dec 08 13:14:44 saatchi systemd[1]: Started Software Heritage Scheduler Journal Client. root@saatchi:~# systemctl status swh-scheduler-schedule-recurrent * swh-scheduler-schedule-recurrent.service - Software Heritage scheduler for recurrent visits Loaded: loaded (/etc/systemd/system/swh-scheduler-schedule-recurrent.service; enabled; vendor preset: enabled) Active: active (running) since Wed 2021-12-08 13:15:04 UTC; 21s ago Main PID: 997182 (swh) Tasks: 18 (limit: 24037) Memory: 71.9M CGroup: /system.slice/swh-scheduler-schedule-recurrent.service `-997182 /usr/bin/python3 /usr/bin/swh --log-level INFO scheduler --config-file /etc/softwareheritage/scheduler/listener-runner.yml schedule-recurren Dec 08 13:15:04 saatchi systemd[1]: Started Software Heritage scheduler for recurrent visits. root@saatchi:~# puppet agent --enable
- schedulers running on a tmux restarted