diff --git a/sysadm/deployment/upgrade-swh-service.rst b/sysadm/deployment/upgrade-swh-service.rst index 18f571a..0d4d654 100644 --- a/sysadm/deployment/upgrade-swh-service.rst +++ b/sysadm/deployment/upgrade-swh-service.rst @@ -1,231 +1,344 @@ .. _upgrade-swh-service: Upgrade swh service =================== .. admonition:: Intended audience :class: important sysadm staff members The document describes the deployment for most of our swh services (rpc services, loaders, listers, indexers, ...). There exists currently 2 ways (as we are transitioning from the first to the second): - static: From git tag to deployment through debian packaging - elastic: From git tag to deployment through kubernetes. -The following will first describe the :ref:`common deployment part `. -This involves some python packaging out of a git tag which will be built and push to -`PyPI `_ and our :ref:`swh debian repositories -`. +The following will first describe the :ref:`common deployment part +`. This involves some python packaging out of a git tag +which will be built and push to `PyPI `_ and our :ref:`swh debian +repositories `. Then follows the actual :ref:`deployment with debian packaging `. It concludes with the :ref:`deployment with kubernetes` chapter. .. _distinct-services: Distinct Services ----------------- 3 kinds services runs on our nodes: - worker services (loaders, listers, cookers, ...) - rpc services (scheduler, objstorage, storage, web, ...) - journal client services (search, scheduler, indexer) -.. _code-and-publish: +.. _code-and-publish-a-release: -Code and publish ----------------- +Code and publish a release +-------------------------- + +It's usually up to the developers. -Code an evolution of fix an issue in the python code within the git repository's master -branch. Open a diff for review, land it when accepted, and start back at :ref:`tag and -push `. +Code an evolution or a bugfix in the impacted git repository (usually the master +branch). Open a diff for review. Land it when accepted. And then release it following +the :ref:`tag and push ` part. .. _tag-and-push: Tag and push ~~~~~~~~~~~~ -When ready, `git tag` and `git push` the new tag of the module. And let jenkins +When ready, `git tag` and `git push` the new tag of the module. Then let jenkins :ref:`publish the artifact `. .. code:: $ git tag -a vA.B.C # (optionally) `git tag -a -s` to sign the tag too $ git push origin --follow-tags .. _publish-artifacts: Publish artifacts ~~~~~~~~~~~~~~~~~ -Jenkins is in charge to publishing to `PyPI `_ the new release (out of -the tag). And then building the debian packaging and push it package to our :ref:`swh +Jenkins is in charge of publishing the new release to `PyPI `_ (out of +the tag just pushed). It then builds the debian package and pushes it to our :ref:`swh debian repositories `. .. _troubleshoot: Troubleshoot ~~~~~~~~~~~~ If jenkins fails for some reason, fix the module be it :ref:`python code ` or the :ref:`debian packaging `. .. _deployment-with-debian-packaging: + Deployment with debian packaging -------------------------------- This mostly involves deploying new version of debian packages to static nodes. .. _upgrade-services: Upgrade services ~~~~~~~~~~~~~~~~ When a new version is released, we need to upgrade the package(s) and restart services. worker services (production): - *swh-worker@loader_{git, hg, svn, npm, ...}* - *swh-worker@lister* - *swh-worker@vault_cooker* journal clients (production): - *swh-indexer-journal-client@{origin_intrinsic_metadata_,extrinsic_metadata_,...}* rpc services (both environment): - *gunicorn-swh-{scheduler, objstorage, storage, web, ...}* From the pergamon node, which is configured for `clush `_, one can act on multiple nodes through the following group names: - *@swh-workers* for the production workers (listers, loaders, ...) - *@azure-workers* for the production ones running on azure (indexers, cookers) - ... See :ref:`deploy-new-lister` for a practical example. .. _troubleshoot-debian-package: Debian package troubleshoot ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Update and checkout the *debian/unstable-swh* branch (in the impacted git repository), then fix whatever is not updated or broken due to a change. -It's usually a missing new package dependency to fix in *debian/control*). Add a new +It's usually a missing new package dependency to fix in *debian/control*. Add a new entry in *debian/changelog*. Make sure gbp builds fine locally. Then tag it and push. Jenkins will build the package anew. .. code:: $ gbp buildpackage --git-tag-only --git-sign-tag # tag it $ git push origin --follow-tags # trigger the build Lather, rinse, repeat until it's all green! Deploy ------ .. _nominal-case: Nominal case ~~~~~~~~~~~~ Update the machine dependencies and restart service. That usually means as sudo user: .. code:: $ apt-get update $ apt-get dist-upgrade -y $ systemctl restart $service Note that this is for one machine you ssh into. We usually wrap those commands from the sysadmin machine pergamon [3] with the *clush* command, something like: .. code:: $ sudo clush -b -w @swh-workers 'apt-get update; env DEBIAN_FRONTEND=noninteractive \ apt-get -o Dpkg::Options::="--force-confdef" \ -o Dpkg::Options::="--force-confold" -y dist-upgrade' [3] pergamon is already *clush* configured to allow multiple ssh connections in parallel on our managed infrastructure nodes. .. _configuration-change-required: Configuration change required ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Either wait for puppet to actually deploy the changes first and then go back to the nominal case. Or force a puppet run: .. code:: sudo clush -b -w $nodes puppet agent -t Note: *-t* is not optional .. _long-standing-upgrade: Long-standing upgrade ~~~~~~~~~~~~~~~~~~~~~ In that case, you may need to stop the impacted services. For example, for long standing data model migration which could take some time. You need to momentarily stop puppet (which by default runs every 30 min to apply manifest changes) and the cron service (which restarts down services) on the workers nodes. Report yourself to the :ref:`storage database migration ` for a concrete case of database migration. .. code:: $ sudo clush -b -w @swh-workers 'systemctl stop cron.service; puppet agent --disable' Then: - Execute the long-standing upgrade. - Go back to the :ref:`nominal case `. - Restart puppet and the cron services on workers .. code:: $ sudo clush -b -w @swh-workers 'systemctl start cron.service; puppet agent --enable' .. _deployment-with-kubernetes: Deployment with Kubernetes -------------------------- -.. warning:: FIXME Enter into details + add a small summary graph +This new deployment involves docker images which are exposing script/services which are +running in a virtual python frozen environment. Those versioned images are then +referenced in a specific helm chart which is deployed in a kubernetes rancher cluster. + +Those docker images are built out of a declared Dockerfile in in the `swh-apps`_ +repository. + +Add a new app +~~~~~~~~~~~~~ + +From the repository `swh-apps`_, create a new Dockerfile. + +Depending on the :ref:`services ` to package, other existing +applications can serve as template: + +- loader: use `git loader `_. +- rpc service: use `graphql `_ +- journal client: use `storage replayer `_ + +.. _update-app-frozen-requirements: + +Update app's frozen requirements +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Once the application is registered. We need to build the frozen environment: + +We'll first need a "build-deps" container with some dependencies set (due to some +limitations in our stack): + +.. code:: + + $ cd swh-apps + $ docker run -ti --rm -v $PWD:/src --user root --name build-deps python:3.9 bash + # inside the container 'build-deps' + root@834faba6202b:/# apt update; apt upgrade -y; apt install -y libcmph-dev + +Out of this container, we are able to generate the frozen requirements for the +$APP_NAME (e.g. *loader_{git, svn, cvs, ...}*, *lister*, *indexer* ...): + +.. code:: + + $ cd swh-apps + $ docker exec --user 1000 build-deps \ + /src/scripts/generate-frozen-requirements $APP_NAME + +You have built your frozen requirements that can be committed. Next, we will +:ref:`generate the image updated with that frozen environment `. + +.. _generate-image: + +Generate image +~~~~~~~~~~~~~~ + +Build the docker image with the frozen environment and then :ref:`publish it +`: + +.. code:: + + $ IMAGE_NAME= # e.g. loader_git, loader_svn, ... + $ IMAGE_VERSION=YYYYMMDD.1 # Template of the day, e.g. `$(date '+%Y%m%d')` + $ REGISTRY=container-registry.softwareheritage.org/infra/swh-apps + $ FULL_IMAGE_VERSION=$REGISTRY/$IMAGE_NAME:$IMAGE_VERSION + $ FULL_IMAGE_LATEST=$REGISTRY/$IMAGE_NAME:latest + $ cd swh-apps/apps// + # This will create the versioned image locally + $ docker build -t $FULL_IMAGE . + # Tag with the latest version + $ docker tag $FULL_IMAGE_VERSION $FULL_IMAGE_LATEST + +.. _gitlab-registry: + +Gitlab registry +~~~~~~~~~~~~~~~ + +You must have a gitlab account and generate a personal access token with at least +`write` access to the `gitlab registry +`_. + +.. _publish-image: + +Publish image +~~~~~~~~~~~~~ + +You must first login your docker to the swh :ref:`gitlab registry ` and +then push the image: + +.. code:: + + $ docker login # login to the gitlab registry (prompted for personal access token) + passwd: ********** + $ docker push $FULL_IMAGE + $ docker push $FULL_IMAGE_LATEST + +Do not forget to :ref:`commit the changes and tag `. + +Finally, let's :ref:`update the impacted chart ` with the new +docker image version. + +.. _commit-changes-and-tag: + +Commit and tag +~~~~~~~~~~~~~~ + +Commit and tag the changes. + +.. _update-impacted-chart: + +Update impacted chart +~~~~~~~~~~~~~~~~~~~~~ + +In the `swh-chart`_ repository, update the `values file +`_ +with the corresponding new changed version. + +:ref:`ArgoCD ` will be in charge of deploying the changes in a rolling +upgrade fashion. -- swh-apps: Add new apps (new Dockerfile) -- swh-apps: Build frozen requirements for a new release of a swh service -- swh-apps: Build impacted docker images with that frozen set of requirements -- Commit and tag -- Push built docker image into our gitlab registry -- swh-charts: Add/Update the image versions -- Commit and push +.. _swh-apps: https://gitlab.softwareheritage.org/infra/swh-apps/ +.. _swh-chart: https://gitlab.softwareheritage.org/infra/ci-cd/swh-charts