Changeset View
Changeset View
Standalone View
Standalone View
sysadm/deployment/upgrade-swh-service.rst
| .. _upgrade-swh-service: | .. _upgrade-swh-service: | |||||||||
| Upgrade swh service | Upgrade swh service | |||||||||
| =================== | =================== | |||||||||
| .. admonition:: Intended audience | .. admonition:: Intended audience | |||||||||
| :class: important | :class: important | |||||||||
| sysadm staff members | sysadm staff members | |||||||||
| Workers | ||||||||||
| ------- | ||||||||||
| Dedicated workers [1] run our *swh-worker@loader_{git, hg, svn, npm, ...}* services. | The document describes the deployment for most of our swh services (rpc services, | |||||||||
| When a new version is released, we need to upgrade their package(s). | loaders, listers, indexers, ...). | |||||||||
| [1] Here are the following group name (in `clush | There exists currently 2 ways (as we are transitioning from the first to the second): | |||||||||
| <https://clustershell.readthedocs.io/en/latest/index.html>`_ terms): | ||||||||||
| - *@swh-workers* for the production workers | - static: From git tag to deployment through debian packaging | |||||||||
| - *@azure-workers* for the production ones running on azure | - elastic: From git tag to deployment through kubernetes. | |||||||||
| - *@staging-loader-workers* for the staging ones | ||||||||||
| The following will first describe the :ref:`common deployment part <code-and-publish>`. | ||||||||||
| This involves some python packaging out of a git tag which will be built and push to | ||||||||||
| `PyPI <https://pypi.org>`_ and our :ref:`swh debian repositories | ||||||||||
| <howto-debian-packaging>`. | ||||||||||
| Then follows the actual :ref:`deployment with debian packaging | ||||||||||
| <deployment-with-debian-packaging>`. It concludes with the :ref:`deployment with | ||||||||||
| kubernetes<deployment-with-kubernetes>` chapter. | ||||||||||
| .. _distinct-services: | ||||||||||
| Distinct Services | ||||||||||
| ----------------- | ||||||||||
| 3 kinds services runs on our nodes: | ||||||||||
| - worker services (loaders, listers, cookers, ...) | ||||||||||
| - rpc services (scheduler, objstorage, storage, web, ...) | ||||||||||
| - journal client services (search, scheduler, indexer) | ||||||||||
| .. _code-and-publish: | ||||||||||
| See :ref:`deploy-new-lister` for a practical example. | ||||||||||
| Code and publish | Code and publish | |||||||||
| ---------------- | ---------------- | |||||||||
| .. _fix-or-evolve-code: | ||||||||||
| Code an evolution of fix an issue in the python code within the git repository's master | Code an evolution of fix an issue in the python code within the git repository's master | |||||||||
| branch. Open a diff for review, land it when accepted, and start back at :ref:`tag and push | branch. Open a diff for review, land it when accepted, and start back at :ref:`tag and | |||||||||
| <tag-and-push>`. | push <tag-and-push>`. | |||||||||
lunar: I don’t understand the first sentence. Are some words missing or maybe a mistranslation from… | ||||||||||
Done Inline Actionstypo in the first sentence "an evolution or fix an issue". ardumont: typo in the first sentence "an evolution `or` fix an issue".
It's not that bad. I've tried to… | ||||||||||
| .. _tag-and-push: | .. _tag-and-push: | |||||||||
| Tag and push | Tag and push | |||||||||
| ~~~~~~~~~~~~ | ~~~~~~~~~~~~ | |||||||||
| When ready, `git tag` and `git push` the new tag of the module. | When ready, `git tag` and `git push` the new tag of the module. And let jenkins | |||||||||
| :ref:`publish the artifact <publish-artifacts>`. | ||||||||||
| .. code:: | .. code:: | |||||||||
| $ git tag vA.B.C | $ git tag -a vA.B.C # (optionally) `git tag -a -s` to sign the tag too | |||||||||
| $ git push origin --follow-tags | $ git push origin --follow-tags | |||||||||
| .. _publish-and-deploy: | .. _publish-artifacts: | |||||||||
| Publish artifacts | ||||||||||
| ~~~~~~~~~~~~~~~~~ | ||||||||||
| Publish and deploy | Jenkins is in charge to publishing to `PyPI <https://pypi.org>`_ the new release (out of | |||||||||
| ~~~~~~~~~~~~~~~~~~ | the tag). And then building the debian packaging and push it package to our :ref:`swh | |||||||||
| debian repositories <howto-debian-packaging>`. | ||||||||||
Done Inline Actions
lunar: [in charge of something](https://en.wiktionary.org/wiki/in_charge) | ||||||||||
Done Inline Actionsyes, i missed it. ardumont: yes, i missed it. | ||||||||||
Done Inline Actions
harmony lunar: harmony | ||||||||||
| Let jenkins publish and deploy the debian package. | ||||||||||
| .. _troubleshoot: | .. _troubleshoot: | |||||||||
| Troubleshoot | Troubleshoot | |||||||||
| ~~~~~~~~~~~~ | ~~~~~~~~~~~~ | |||||||||
| If jenkins fails for some reason, fix the module be it :ref:`python code | If jenkins fails for some reason, fix the module be it :ref:`python code | |||||||||
| <fix-or-evolve-code>` or the :ref:`debian packaging <troubleshoot-debian-package>`. | <fix-or-evolve-code>` or the :ref:`debian packaging <troubleshoot-debian-package>`. | |||||||||
| .. _deployment-with-debian-packaging: | ||||||||||
| Deployment with debian packaging | ||||||||||
| -------------------------------- | ||||||||||
| This mostly involves deploying new version of debian packages to static nodes. | ||||||||||
| .. _upgrade-services: | ||||||||||
| Upgrade services | ||||||||||
| ~~~~~~~~~~~~~~~~ | ||||||||||
| When a new version is released, we need to upgrade the package(s) and restart services. | ||||||||||
| worker services (production): | ||||||||||
| - *swh-worker@loader_{git, hg, svn, npm, ...}* | ||||||||||
| - *swh-worker@lister* | ||||||||||
| - *swh-worker@vault_cooker* | ||||||||||
| journal clients (production): | ||||||||||
| - *swh-indexer-journal-client@{origin_intrinsic_metadata_,extrinsic_metadata_,...}* | ||||||||||
| rpc services (both environment): | ||||||||||
| - *gunicorn-swh-{scheduler, objstorage, storage, web, ...}* | ||||||||||
| From the pergamon node, which is configured for `clush | ||||||||||
| <https://clustershell.readthedocs.io/en/latest/index.html>`_, one can act on multiple | ||||||||||
| nodes through the following group names: | ||||||||||
| - *@swh-workers* for the production workers (listers, loaders, ...) | ||||||||||
| - *@azure-workers* for the production ones running on azure (indexers, cookers) | ||||||||||
| - ... | ||||||||||
| See :ref:`deploy-new-lister` for a practical example. | ||||||||||
| .. _troubleshoot-debian-package: | .. _troubleshoot-debian-package: | |||||||||
| Debian package troubleshoot | Debian package troubleshoot | |||||||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |||||||||
| In that case, upgrade and checkout the *debian/unstable-swh* branch, then fix whatever | Update and checkout the *debian/unstable-swh* branch (in the impacted git repository), | |||||||||
| is not updated or broken due to a change. It's usually a missing new package dependency | then fix whatever is not updated or broken due to a change. | |||||||||
| to fix in *debian/control*). Add a new entry in *debian/changelog*. Make sure gbp builds | ||||||||||
Done Inline Actions
Extra parenthesis. lunar: Extra parenthesis. | ||||||||||
| fine. Then tag it. Jenkins will build the package anew. | It's usually a missing new package dependency to fix in *debian/control*). Add a new | |||||||||
| entry in *debian/changelog*. Make sure gbp builds fine locally. Then tag it and push. | ||||||||||
| Jenkins will build the package anew. | ||||||||||
| .. code:: | .. code:: | |||||||||
| $ gbp buildpackage --git-tag-only --git-sign-tag # tag it | $ gbp buildpackage --git-tag-only --git-sign-tag # tag it | |||||||||
| $ git push origin --follow-tags # trigger the build | $ git push origin --follow-tags # trigger the build | |||||||||
| Lather, rinse, repeat until it's all green! | ||||||||||
| Deploy | Deploy | |||||||||
| ------ | ------ | |||||||||
| .. _nominal_case: | .. _nominal-case: | |||||||||
| Nominal case | Nominal case | |||||||||
| ~~~~~~~~~~~~ | ~~~~~~~~~~~~ | |||||||||
| Update the machine dependencies and restart service. That usually means | Update the machine dependencies and restart service. That usually means as sudo user: | |||||||||
| as sudo user: | ||||||||||
| .. code:: | .. code:: | |||||||||
| $ apt-get update | $ apt-get update | |||||||||
| $ apt-get dist-upgrade -y | $ apt-get dist-upgrade -y | |||||||||
| $ systemctl restart swh-worker@loader_${type} | $ systemctl restart $service | |||||||||
| Note that this is for one machine you ssh into. | Note that this is for one machine you ssh into. | |||||||||
| We usually wrap those commands from the sysadmin machine pergamon [3] with the *clush* | We usually wrap those commands from the sysadmin machine pergamon [3] with the *clush* | |||||||||
| command, something like: | command, something like: | |||||||||
| .. code:: | .. code:: | |||||||||
| Show All 11 Lines | ||||||||||
| Either wait for puppet to actually deploy the changes first and then go back to the | Either wait for puppet to actually deploy the changes first and then go back to the | |||||||||
| nominal case. | nominal case. | |||||||||
| Or force a puppet run: | Or force a puppet run: | |||||||||
| .. code:: | .. code:: | |||||||||
| sudo clush -b -w @swh-workers puppet agent -t | sudo clush -b -w $nodes puppet agent -t | |||||||||
| Note: *-t* is not optional | Note: *-t* is not optional | |||||||||
| .. _long-standing-migration: | .. _long-standing-upgrade: | |||||||||
| Long-standing migration | Long-standing upgrade | |||||||||
| ~~~~~~~~~~~~~~~~~~~~~~~ | ~~~~~~~~~~~~~~~~~~~~~ | |||||||||
| In that case, you may need to stop all services for migration which could take some time | In that case, you may need to stop the impacted services. For example, for long standing | |||||||||
| (because lots of data is migrated for example). | data model migration which could take some time. | |||||||||
| You need to momentarily stop puppet (which runs every 30 min to apply manifest changes) | You need to momentarily stop puppet (which by default runs every 30 min to apply | |||||||||
| and the cron service (which restarts down services) on the workers nodes. | manifest changes) and the cron service (which restarts down services) on the workers | |||||||||
| nodes. | ||||||||||
| Report yourself to the :ref:`storage database migration <storage-database-migration>` | Report yourself to the :ref:`storage database migration <storage-database-migration>` | |||||||||
| for a concrete case of database migration. | for a concrete case of database migration. | |||||||||
| .. code:: | .. code:: | |||||||||
| $ sudo clush -b -w @swh-workers 'systemctl stop cron.service; puppet agent --disable' | $ sudo clush -b -w @swh-workers 'systemctl stop cron.service; puppet agent --disable' | |||||||||
| Then: | Then: | |||||||||
| - Execute the database migration. | - Execute the long-standing upgrade. | |||||||||
| - Go back to the nominal case. | - Go back to the :ref:`nominal case <nominal-case>`. | |||||||||
| - Restart puppet and the cron on workers | - Restart puppet and the cron services on workers | |||||||||
| .. code:: | .. code:: | |||||||||
| $ sudo clush -b -w @swh-workers 'systemctl start cron.service; puppet agent --enable' | $ sudo clush -b -w @swh-workers 'systemctl start cron.service; puppet agent --enable' | |||||||||
| .. _deployment-with-kubernetes: | ||||||||||
| Deployment with Kubernetes | ||||||||||
| -------------------------- | ||||||||||
| .. warning:: FIXME Enter into details + add a small summary graph | ||||||||||
| - swh-apps: Add new apps (new Dockerfile) | ||||||||||
| - swh-apps: Build frozen requirements for a new release of a swh service | ||||||||||
| - swh-apps: Build impacted docker images with that frozen set of requirements | ||||||||||
| - Commit and tag | ||||||||||
| - Push built docker image into our gitlab registry | ||||||||||
| - swh-charts: Add/Update the image versions | ||||||||||
| - Commit and push | ||||||||||
Not Done Inline ActionsI guess if we're going that way (I just use a virtualenv on my host machine, fwiw) we'll (eventually) want to have a Dockerfile to run this script rather than poke at a container manually. That'll happen when we figure out automation for this process. olasd: I guess if we're going that way (I just use a virtualenv on my host machine, fwiw) we'll… | ||||||||||
Done Inline Actionsyes! ardumont: yes! | ||||||||||
Done Inline Actionsheads up:
[1] https://gitlab.softwareheritage.org/infra/swh-apps/-/merge_requests/10 [2] D8871 ardumont: heads up:
- Adaptation to make that step automatized [1]
- And then once ^ landed, the… | ||||||||||
I don’t understand the first sentence. Are some words missing or maybe a mistranslation from French? (Happy to help.)