Changeset View
Changeset View
Standalone View
Standalone View
sysadm/deployment/upgrade-swh-service.rst
.. _upgrade-swh-service: | .. _upgrade-swh-service: | |||||||||
Upgrade swh service | Upgrade swh service | |||||||||
=================== | =================== | |||||||||
.. admonition:: Intended audience | .. admonition:: Intended audience | |||||||||
:class: important | :class: important | |||||||||
sysadm staff members | sysadm staff members | |||||||||
Workers | ||||||||||
------- | ||||||||||
Dedicated workers [1] run our *swh-worker@loader_{git, hg, svn, npm, ...}* services. | The document describes the deployment for most of our swh services (rpc services, | |||||||||
When a new version is released, we need to upgrade their package(s). | loaders, listers, indexers, ...). | |||||||||
[1] Here are the following group name (in `clush | There exists currently 2 ways (as we are transitioning from the first to the second): | |||||||||
<https://clustershell.readthedocs.io/en/latest/index.html>`_ terms): | ||||||||||
- *@swh-workers* for the production workers | - static: From git tag to deployment through debian packaging | |||||||||
- *@azure-workers* for the production ones running on azure | - elastic: From git tag to deployment through kubernetes. | |||||||||
- *@staging-loader-workers* for the staging ones | ||||||||||
The following will first describe the :ref:`common deployment part <code-and-publish>`. | ||||||||||
This involves some python packaging out of a git tag which will be built and push to | ||||||||||
`PyPI <https://pypi.org>`_ and our :ref:`swh debian repositories | ||||||||||
<howto-debian-packaging>`. | ||||||||||
Then follows the actual :ref:`deployment with debian packaging | ||||||||||
<deployment-with-debian-packaging>`. It concludes with the :ref:`deployment with | ||||||||||
kubernetes<deployment-with-kubernetes>` chapter. | ||||||||||
.. _distinct-services: | ||||||||||
Distinct Services | ||||||||||
----------------- | ||||||||||
3 kinds services runs on our nodes: | ||||||||||
- worker services (loaders, listers, cookers, ...) | ||||||||||
- rpc services (scheduler, objstorage, storage, web, ...) | ||||||||||
- journal client services (search, scheduler, indexer) | ||||||||||
.. _code-and-publish: | ||||||||||
See :ref:`deploy-new-lister` for a practical example. | ||||||||||
Code and publish | Code and publish | |||||||||
---------------- | ---------------- | |||||||||
.. _fix-or-evolve-code: | ||||||||||
Code an evolution of fix an issue in the python code within the git repository's master | Code an evolution of fix an issue in the python code within the git repository's master | |||||||||
branch. Open a diff for review, land it when accepted, and start back at :ref:`tag and push | branch. Open a diff for review, land it when accepted, and start back at :ref:`tag and | |||||||||
<tag-and-push>`. | push <tag-and-push>`. | |||||||||
lunar: I don’t understand the first sentence. Are some words missing or maybe a mistranslation from… | ||||||||||
Done Inline Actionstypo in the first sentence "an evolution or fix an issue". ardumont: typo in the first sentence "an evolution `or` fix an issue".
It's not that bad. I've tried to… | ||||||||||
.. _tag-and-push: | .. _tag-and-push: | |||||||||
Tag and push | Tag and push | |||||||||
~~~~~~~~~~~~ | ~~~~~~~~~~~~ | |||||||||
When ready, `git tag` and `git push` the new tag of the module. | When ready, `git tag` and `git push` the new tag of the module. And let jenkins | |||||||||
:ref:`publish the artifact <publish-artifacts>`. | ||||||||||
.. code:: | .. code:: | |||||||||
$ git tag vA.B.C | $ git tag -a vA.B.C # (optionally) `git tag -a -s` to sign the tag too | |||||||||
$ git push origin --follow-tags | $ git push origin --follow-tags | |||||||||
.. _publish-and-deploy: | .. _publish-artifacts: | |||||||||
Publish artifacts | ||||||||||
~~~~~~~~~~~~~~~~~ | ||||||||||
Publish and deploy | Jenkins is in charge to publishing to `PyPI <https://pypi.org>`_ the new release (out of | |||||||||
~~~~~~~~~~~~~~~~~~ | the tag). And then building the debian packaging and push it package to our :ref:`swh | |||||||||
debian repositories <howto-debian-packaging>`. | ||||||||||
Done Inline Actions
lunar: [in charge of something](https://en.wiktionary.org/wiki/in_charge) | ||||||||||
Done Inline Actionsyes, i missed it. ardumont: yes, i missed it. | ||||||||||
Done Inline Actions
harmony lunar: harmony | ||||||||||
Let jenkins publish and deploy the debian package. | ||||||||||
.. _troubleshoot: | .. _troubleshoot: | |||||||||
Troubleshoot | Troubleshoot | |||||||||
~~~~~~~~~~~~ | ~~~~~~~~~~~~ | |||||||||
If jenkins fails for some reason, fix the module be it :ref:`python code | If jenkins fails for some reason, fix the module be it :ref:`python code | |||||||||
<fix-or-evolve-code>` or the :ref:`debian packaging <troubleshoot-debian-package>`. | <fix-or-evolve-code>` or the :ref:`debian packaging <troubleshoot-debian-package>`. | |||||||||
.. _deployment-with-debian-packaging: | ||||||||||
Deployment with debian packaging | ||||||||||
-------------------------------- | ||||||||||
This mostly involves deploying new version of debian packages to static nodes. | ||||||||||
.. _upgrade-services: | ||||||||||
Upgrade services | ||||||||||
~~~~~~~~~~~~~~~~ | ||||||||||
When a new version is released, we need to upgrade the package(s) and restart services. | ||||||||||
worker services (production): | ||||||||||
- *swh-worker@loader_{git, hg, svn, npm, ...}* | ||||||||||
- *swh-worker@lister* | ||||||||||
- *swh-worker@vault_cooker* | ||||||||||
journal clients (production): | ||||||||||
- *swh-indexer-journal-client@{origin_intrinsic_metadata_,extrinsic_metadata_,...}* | ||||||||||
rpc services (both environment): | ||||||||||
- *gunicorn-swh-{scheduler, objstorage, storage, web, ...}* | ||||||||||
From the pergamon node, which is configured for `clush | ||||||||||
<https://clustershell.readthedocs.io/en/latest/index.html>`_, one can act on multiple | ||||||||||
nodes through the following group names: | ||||||||||
- *@swh-workers* for the production workers (listers, loaders, ...) | ||||||||||
- *@azure-workers* for the production ones running on azure (indexers, cookers) | ||||||||||
- ... | ||||||||||
See :ref:`deploy-new-lister` for a practical example. | ||||||||||
.. _troubleshoot-debian-package: | .. _troubleshoot-debian-package: | |||||||||
Debian package troubleshoot | Debian package troubleshoot | |||||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |||||||||
In that case, upgrade and checkout the *debian/unstable-swh* branch, then fix whatever | Update and checkout the *debian/unstable-swh* branch (in the impacted git repository), | |||||||||
is not updated or broken due to a change. It's usually a missing new package dependency | then fix whatever is not updated or broken due to a change. | |||||||||
to fix in *debian/control*). Add a new entry in *debian/changelog*. Make sure gbp builds | ||||||||||
Done Inline Actions
Extra parenthesis. lunar: Extra parenthesis. | ||||||||||
fine. Then tag it. Jenkins will build the package anew. | It's usually a missing new package dependency to fix in *debian/control*). Add a new | |||||||||
entry in *debian/changelog*. Make sure gbp builds fine locally. Then tag it and push. | ||||||||||
Jenkins will build the package anew. | ||||||||||
.. code:: | .. code:: | |||||||||
$ gbp buildpackage --git-tag-only --git-sign-tag # tag it | $ gbp buildpackage --git-tag-only --git-sign-tag # tag it | |||||||||
$ git push origin --follow-tags # trigger the build | $ git push origin --follow-tags # trigger the build | |||||||||
Lather, rinse, repeat until it's all green! | ||||||||||
Deploy | Deploy | |||||||||
------ | ------ | |||||||||
.. _nominal_case: | .. _nominal-case: | |||||||||
Nominal case | Nominal case | |||||||||
~~~~~~~~~~~~ | ~~~~~~~~~~~~ | |||||||||
Update the machine dependencies and restart service. That usually means | Update the machine dependencies and restart service. That usually means as sudo user: | |||||||||
as sudo user: | ||||||||||
.. code:: | .. code:: | |||||||||
$ apt-get update | $ apt-get update | |||||||||
$ apt-get dist-upgrade -y | $ apt-get dist-upgrade -y | |||||||||
$ systemctl restart swh-worker@loader_${type} | $ systemctl restart $service | |||||||||
Note that this is for one machine you ssh into. | Note that this is for one machine you ssh into. | |||||||||
We usually wrap those commands from the sysadmin machine pergamon [3] with the *clush* | We usually wrap those commands from the sysadmin machine pergamon [3] with the *clush* | |||||||||
command, something like: | command, something like: | |||||||||
.. code:: | .. code:: | |||||||||
Show All 11 Lines | ||||||||||
Either wait for puppet to actually deploy the changes first and then go back to the | Either wait for puppet to actually deploy the changes first and then go back to the | |||||||||
nominal case. | nominal case. | |||||||||
Or force a puppet run: | Or force a puppet run: | |||||||||
.. code:: | .. code:: | |||||||||
sudo clush -b -w @swh-workers puppet agent -t | sudo clush -b -w $nodes puppet agent -t | |||||||||
Note: *-t* is not optional | Note: *-t* is not optional | |||||||||
.. _long-standing-migration: | .. _long-standing-upgrade: | |||||||||
Long-standing migration | Long-standing upgrade | |||||||||
~~~~~~~~~~~~~~~~~~~~~~~ | ~~~~~~~~~~~~~~~~~~~~~ | |||||||||
In that case, you may need to stop all services for migration which could take some time | In that case, you may need to stop the impacted services. For example, for long standing | |||||||||
(because lots of data is migrated for example). | data model migration which could take some time. | |||||||||
You need to momentarily stop puppet (which runs every 30 min to apply manifest changes) | You need to momentarily stop puppet (which by default runs every 30 min to apply | |||||||||
and the cron service (which restarts down services) on the workers nodes. | manifest changes) and the cron service (which restarts down services) on the workers | |||||||||
nodes. | ||||||||||
Report yourself to the :ref:`storage database migration <storage-database-migration>` | Report yourself to the :ref:`storage database migration <storage-database-migration>` | |||||||||
for a concrete case of database migration. | for a concrete case of database migration. | |||||||||
.. code:: | .. code:: | |||||||||
$ sudo clush -b -w @swh-workers 'systemctl stop cron.service; puppet agent --disable' | $ sudo clush -b -w @swh-workers 'systemctl stop cron.service; puppet agent --disable' | |||||||||
Then: | Then: | |||||||||
- Execute the database migration. | - Execute the long-standing upgrade. | |||||||||
- Go back to the nominal case. | - Go back to the :ref:`nominal case <nominal-case>`. | |||||||||
- Restart puppet and the cron on workers | - Restart puppet and the cron services on workers | |||||||||
.. code:: | .. code:: | |||||||||
$ sudo clush -b -w @swh-workers 'systemctl start cron.service; puppet agent --enable' | $ sudo clush -b -w @swh-workers 'systemctl start cron.service; puppet agent --enable' | |||||||||
.. _deployment-with-kubernetes: | ||||||||||
Deployment with Kubernetes | ||||||||||
-------------------------- | ||||||||||
.. warning:: FIXME Enter into details + add a small summary graph | ||||||||||
- swh-apps: Add new apps (new Dockerfile) | ||||||||||
- swh-apps: Build frozen requirements for a new release of a swh service | ||||||||||
- swh-apps: Build impacted docker images with that frozen set of requirements | ||||||||||
- Commit and tag | ||||||||||
- Push built docker image into our gitlab registry | ||||||||||
- swh-charts: Add/Update the image versions | ||||||||||
- Commit and push | ||||||||||
Not Done Inline ActionsI guess if we're going that way (I just use a virtualenv on my host machine, fwiw) we'll (eventually) want to have a Dockerfile to run this script rather than poke at a container manually. That'll happen when we figure out automation for this process. olasd: I guess if we're going that way (I just use a virtualenv on my host machine, fwiw) we'll… | ||||||||||
Done Inline Actionsyes! ardumont: yes! | ||||||||||
Done Inline Actionsheads up:
[1] https://gitlab.softwareheritage.org/infra/swh-apps/-/merge_requests/10 [2] D8871 ardumont: heads up:
- Adaptation to make that step automatized [1]
- And then once ^ landed, the… |
I don’t understand the first sentence. Are some words missing or maybe a mistranslation from French? (Happy to help.)