diff --git a/sysadm/deployment/upgrade-swh-service.rst b/sysadm/deployment/upgrade-swh-service.rst index e677cb7..18f571a 100644 --- a/sysadm/deployment/upgrade-swh-service.rst +++ b/sysadm/deployment/upgrade-swh-service.rst @@ -1,151 +1,231 @@ .. _upgrade-swh-service: Upgrade swh service =================== .. admonition:: Intended audience :class: important sysadm staff members -Workers -------- -Dedicated workers [1] run our *swh-worker@loader_{git, hg, svn, npm, ...}* services. -When a new version is released, we need to upgrade their package(s). +The document describes the deployment for most of our swh services (rpc services, +loaders, listers, indexers, ...). -[1] Here are the following group name (in `clush -`_ terms): +There exists currently 2 ways (as we are transitioning from the first to the second): -- *@swh-workers* for the production workers -- *@azure-workers* for the production ones running on azure -- *@staging-loader-workers* for the staging ones +- static: From git tag to deployment through debian packaging +- elastic: From git tag to deployment through kubernetes. + + +The following will first describe the :ref:`common deployment part `. +This involves some python packaging out of a git tag which will be built and push to +`PyPI `_ and our :ref:`swh debian repositories +`. + +Then follows the actual :ref:`deployment with debian packaging +`. It concludes with the :ref:`deployment with +kubernetes` chapter. + +.. _distinct-services: + +Distinct Services +----------------- + +3 kinds services runs on our nodes: + +- worker services (loaders, listers, cookers, ...) +- rpc services (scheduler, objstorage, storage, web, ...) +- journal client services (search, scheduler, indexer) + +.. _code-and-publish: -See :ref:`deploy-new-lister` for a practical example. Code and publish ---------------- -.. _fix-or-evolve-code: - Code an evolution of fix an issue in the python code within the git repository's master -branch. Open a diff for review, land it when accepted, and start back at :ref:`tag and push -`. +branch. Open a diff for review, land it when accepted, and start back at :ref:`tag and +push `. .. _tag-and-push: Tag and push ~~~~~~~~~~~~ -When ready, `git tag` and `git push` the new tag of the module. +When ready, `git tag` and `git push` the new tag of the module. And let jenkins +:ref:`publish the artifact `. .. code:: - $ git tag vA.B.C + $ git tag -a vA.B.C # (optionally) `git tag -a -s` to sign the tag too $ git push origin --follow-tags -.. _publish-and-deploy: +.. _publish-artifacts: + +Publish artifacts +~~~~~~~~~~~~~~~~~ -Publish and deploy -~~~~~~~~~~~~~~~~~~ +Jenkins is in charge to publishing to `PyPI `_ the new release (out of +the tag). And then building the debian packaging and push it package to our :ref:`swh +debian repositories `. -Let jenkins publish and deploy the debian package. .. _troubleshoot: Troubleshoot ~~~~~~~~~~~~ If jenkins fails for some reason, fix the module be it :ref:`python code ` or the :ref:`debian packaging `. + +.. _deployment-with-debian-packaging: + +Deployment with debian packaging +-------------------------------- + +This mostly involves deploying new version of debian packages to static nodes. + +.. _upgrade-services: + +Upgrade services +~~~~~~~~~~~~~~~~ + +When a new version is released, we need to upgrade the package(s) and restart services. + +worker services (production): + +- *swh-worker@loader_{git, hg, svn, npm, ...}* +- *swh-worker@lister* +- *swh-worker@vault_cooker* + +journal clients (production): + +- *swh-indexer-journal-client@{origin_intrinsic_metadata_,extrinsic_metadata_,...}* + +rpc services (both environment): + +- *gunicorn-swh-{scheduler, objstorage, storage, web, ...}* + + +From the pergamon node, which is configured for `clush +`_, one can act on multiple +nodes through the following group names: + +- *@swh-workers* for the production workers (listers, loaders, ...) +- *@azure-workers* for the production ones running on azure (indexers, cookers) +- ... + +See :ref:`deploy-new-lister` for a practical example. + .. _troubleshoot-debian-package: Debian package troubleshoot ~~~~~~~~~~~~~~~~~~~~~~~~~~~ -In that case, upgrade and checkout the *debian/unstable-swh* branch, then fix whatever -is not updated or broken due to a change. It's usually a missing new package dependency -to fix in *debian/control*). Add a new entry in *debian/changelog*. Make sure gbp builds -fine. Then tag it. Jenkins will build the package anew. +Update and checkout the *debian/unstable-swh* branch (in the impacted git repository), +then fix whatever is not updated or broken due to a change. + +It's usually a missing new package dependency to fix in *debian/control*). Add a new +entry in *debian/changelog*. Make sure gbp builds fine locally. Then tag it and push. +Jenkins will build the package anew. .. code:: $ gbp buildpackage --git-tag-only --git-sign-tag # tag it $ git push origin --follow-tags # trigger the build +Lather, rinse, repeat until it's all green! + Deploy ------ -.. _nominal_case: +.. _nominal-case: Nominal case ~~~~~~~~~~~~ -Update the machine dependencies and restart service. That usually means -as sudo user: +Update the machine dependencies and restart service. That usually means as sudo user: .. code:: $ apt-get update $ apt-get dist-upgrade -y - $ systemctl restart swh-worker@loader_${type} + $ systemctl restart $service Note that this is for one machine you ssh into. We usually wrap those commands from the sysadmin machine pergamon [3] with the *clush* command, something like: .. code:: $ sudo clush -b -w @swh-workers 'apt-get update; env DEBIAN_FRONTEND=noninteractive \ apt-get -o Dpkg::Options::="--force-confdef" \ -o Dpkg::Options::="--force-confold" -y dist-upgrade' [3] pergamon is already *clush* configured to allow multiple ssh connections in parallel on our managed infrastructure nodes. .. _configuration-change-required: Configuration change required ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Either wait for puppet to actually deploy the changes first and then go back to the nominal case. Or force a puppet run: .. code:: - sudo clush -b -w @swh-workers puppet agent -t + sudo clush -b -w $nodes puppet agent -t Note: *-t* is not optional -.. _long-standing-migration: +.. _long-standing-upgrade: -Long-standing migration -~~~~~~~~~~~~~~~~~~~~~~~ +Long-standing upgrade +~~~~~~~~~~~~~~~~~~~~~ -In that case, you may need to stop all services for migration which could take some time -(because lots of data is migrated for example). +In that case, you may need to stop the impacted services. For example, for long standing +data model migration which could take some time. -You need to momentarily stop puppet (which runs every 30 min to apply manifest changes) -and the cron service (which restarts down services) on the workers nodes. +You need to momentarily stop puppet (which by default runs every 30 min to apply +manifest changes) and the cron service (which restarts down services) on the workers +nodes. Report yourself to the :ref:`storage database migration ` for a concrete case of database migration. .. code:: $ sudo clush -b -w @swh-workers 'systemctl stop cron.service; puppet agent --disable' + Then: -- Execute the database migration. -- Go back to the nominal case. -- Restart puppet and the cron on workers +- Execute the long-standing upgrade. +- Go back to the :ref:`nominal case `. +- Restart puppet and the cron services on workers .. code:: $ sudo clush -b -w @swh-workers 'systemctl start cron.service; puppet agent --enable' + +.. _deployment-with-kubernetes: + +Deployment with Kubernetes +-------------------------- + +.. warning:: FIXME Enter into details + add a small summary graph + +- swh-apps: Add new apps (new Dockerfile) +- swh-apps: Build frozen requirements for a new release of a swh service +- swh-apps: Build impacted docker images with that frozen set of requirements +- Commit and tag +- Push built docker image into our gitlab registry +- swh-charts: Add/Update the image versions +- Commit and push