diff --git a/sysadm/deployment/upgrade-swh-service.rst b/sysadm/deployment/upgrade-swh-service.rst --- a/sysadm/deployment/upgrade-swh-service.rst +++ b/sysadm/deployment/upgrade-swh-service.rst @@ -8,48 +8,68 @@ sysadm staff members -Workers -------- -Dedicated workers [1] run our *swh-worker@loader_{git, hg, svn, npm, ...}* services. -When a new version is released, we need to upgrade their package(s). +The document describes the deployment for most of our swh services (rpc services, +loaders, listers, indexers, ...). -[1] Here are the following group name (in `clush -`_ terms): +There exists currently 2 ways (as we are transitioning from the first to the second): -- *@swh-workers* for the production workers -- *@azure-workers* for the production ones running on azure -- *@staging-loader-workers* for the staging ones +- static: From git tag to deployment through debian packaging +- elastic: From git tag to deployment through kubernetes. + + +The following will first describe the :ref:`common deployment part `. +This involves some python packaging out of a git tag which will be built and push to +`PyPI `_ and our :ref:`swh debian repositories +`. + +Then follows the actual :ref:`deployment with debian packaging +`. It concludes with the :ref:`deployment with +kubernetes` chapter. + +.. _distinct-services: + +Distinct Services +----------------- + +3 kinds services runs on our nodes: + +- worker services (loaders, listers, cookers, ...) +- rpc services (scheduler, objstorage, storage, web, ...) +- journal client services (search, scheduler, indexer) + +.. _code-and-publish: -See :ref:`deploy-new-lister` for a practical example. Code and publish ---------------- -.. _fix-or-evolve-code: - Code an evolution of fix an issue in the python code within the git repository's master -branch. Open a diff for review, land it when accepted, and start back at :ref:`tag and push -`. +branch. Open a diff for review, land it when accepted, and start back at :ref:`tag and +push `. .. _tag-and-push: Tag and push ~~~~~~~~~~~~ -When ready, `git tag` and `git push` the new tag of the module. +When ready, `git tag` and `git push` the new tag of the module. And let jenkins +:ref:`publish the artifact `. .. code:: - $ git tag vA.B.C + $ git tag -a vA.B.C # (optionally) `git tag -a -s` to sign the tag too $ git push origin --follow-tags -.. _publish-and-deploy: +.. _publish-artifacts: -Publish and deploy -~~~~~~~~~~~~~~~~~~ +Publish artifacts +~~~~~~~~~~~~~~~~~ + +Jenkins is in charge to publishing to `PyPI `_ the new release (out of +the tag). And then building the debian packaging and push it package to our :ref:`swh +debian repositories `. -Let jenkins publish and deploy the debian package. .. _troubleshoot: @@ -59,37 +79,81 @@ If jenkins fails for some reason, fix the module be it :ref:`python code ` or the :ref:`debian packaging `. + +.. _deployment-with-debian-packaging: + + +Deployment with debian packaging +-------------------------------- + +This mostly involves deploying new version of debian packages to static nodes. + +.. _upgrade-services: + +Upgrade services +~~~~~~~~~~~~~~~~ + +When a new version is released, we need to upgrade the package(s) and restart services. + +worker services (production): + +- *swh-worker@loader_{git, hg, svn, npm, ...}* +- *swh-worker@lister* +- *swh-worker@vault_cooker* + +journal clients (production): + +- *swh-indexer-journal-client@{origin_intrinsic_metadata_,extrinsic_metadata_,...}* + +rpc services (both environment): + +- *gunicorn-swh-{scheduler, objstorage, storage, web, ...}* + + +From the pergamon node, which is configured for `clush +`_, one can act on multiple +nodes through the following group names: + +- *@swh-workers* for the production workers (listers, loaders, ...) +- *@azure-workers* for the production ones running on azure (indexers, cookers) +- ... + +See :ref:`deploy-new-lister` for a practical example. + .. _troubleshoot-debian-package: Debian package troubleshoot ~~~~~~~~~~~~~~~~~~~~~~~~~~~ -In that case, upgrade and checkout the *debian/unstable-swh* branch, then fix whatever -is not updated or broken due to a change. It's usually a missing new package dependency -to fix in *debian/control*). Add a new entry in *debian/changelog*. Make sure gbp builds -fine. Then tag it. Jenkins will build the package anew. +Update and checkout the *debian/unstable-swh* branch (in the impacted git repository), +then fix whatever is not updated or broken due to a change. + +It's usually a missing new package dependency to fix in *debian/control*). Add a new +entry in *debian/changelog*. Make sure gbp builds fine locally. Then tag it and push. +Jenkins will build the package anew. .. code:: $ gbp buildpackage --git-tag-only --git-sign-tag # tag it $ git push origin --follow-tags # trigger the build +Lather, rinse, repeat until it's all green! + Deploy ------ -.. _nominal_case: +.. _nominal-case: Nominal case ~~~~~~~~~~~~ -Update the machine dependencies and restart service. That usually means -as sudo user: +Update the machine dependencies and restart service. That usually means as sudo user: .. code:: $ apt-get update $ apt-get dist-upgrade -y - $ systemctl restart swh-worker@loader_${type} + $ systemctl restart $service Note that this is for one machine you ssh into. @@ -117,20 +181,21 @@ .. code:: - sudo clush -b -w @swh-workers puppet agent -t + sudo clush -b -w $nodes puppet agent -t Note: *-t* is not optional -.. _long-standing-migration: +.. _long-standing-upgrade: -Long-standing migration -~~~~~~~~~~~~~~~~~~~~~~~ +Long-standing upgrade +~~~~~~~~~~~~~~~~~~~~~ -In that case, you may need to stop all services for migration which could take some time -(because lots of data is migrated for example). +In that case, you may need to stop the impacted services. For example, for long standing +data model migration which could take some time. -You need to momentarily stop puppet (which runs every 30 min to apply manifest changes) -and the cron service (which restarts down services) on the workers nodes. +You need to momentarily stop puppet (which by default runs every 30 min to apply +manifest changes) and the cron service (which restarts down services) on the workers +nodes. Report yourself to the :ref:`storage database migration ` for a concrete case of database migration. @@ -139,13 +204,139 @@ $ sudo clush -b -w @swh-workers 'systemctl stop cron.service; puppet agent --disable' + Then: -- Execute the database migration. -- Go back to the nominal case. -- Restart puppet and the cron on workers +- Execute the long-standing upgrade. +- Go back to the :ref:`nominal case `. +- Restart puppet and the cron services on workers .. code:: $ sudo clush -b -w @swh-workers 'systemctl start cron.service; puppet agent --enable' + +.. _deployment-with-kubernetes: + +Deployment with Kubernetes +-------------------------- + +This new deployment involves docker images which are exposing script/services which are +running in a virtual python frozen environment. Those versioned images are then +referenced in a specific helm chart which is deployed in a kubernetes rancher cluster. + +Those docker images are built out of a declared Dockerfile in in the `swh-apps`_ +repository. + +Add a new app +~~~~~~~~~~~~~ + +From the repository `swh-apps`_, create a new Dockerfile. + +Depending on the :ref:`services ` to package, other existing +applications can serve as template: + +- loader: use `git loader `_. +- rpc service: use `graphql `_ +- journal client: use `storage replayer `_ + +.. _update-app-frozen-requirements: + +Update app's frozen requirements +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Once the application is registered. We need to build the frozen environment: + +We'll first need a "build-deps" container with some dependencies set (due to some +limitations in our stack): + +.. code:: + + $ cd swh-apps + $ docker run -ti --rm -v $PWD:/src --user root --name build-deps python:3.9 bash + # inside the container 'build-deps' + root@834faba6202b:/# apt update; apt upgrade -y; apt install -y libcmph-dev + +Out of this container, we are able to generate the frozen requirements for the +$APP_NAME (e.g. *loader_{git, svn, cvs, ...}*, *lister*, *indexer* ...): + +.. code:: + + $ cd swh-apps + $ docker exec --user 1000 build-deps \ + /src/scripts/generate-frozen-requirements $APP_NAME + +You have built your frozen requirements that can be committed. Next, we will +:ref:`generate the image updated with that frozen environment `. + +.. _generate-image: + +Generate image +~~~~~~~~~~~~~~ + +Build the docker image with the frozen environment and then :ref:`publish it +`: + +.. code:: + + $ IMAGE_NAME= # e.g. loader_git, loader_svn, ... + $ IMAGE_VERSION=YYYYMMDD.1 # Template of the day, e.g. `$(date '+%Y%m%d')` + $ REGISTRY=container-registry.softwareheritage.org/infra/swh-apps + $ FULL_IMAGE_VERSION=$REGISTRY/$IMAGE_NAME:$IMAGE_VERSION + $ FULL_IMAGE_LATEST=$REGISTRY/$IMAGE_NAME:latest + $ cd swh-apps/apps// + # This will create the versioned image locally + $ docker build -t $FULL_IMAGE . + # Tag with the latest version + $ docker tag $FULL_IMAGE_VERSION $FULL_IMAGE_LATEST + +.. _gitlab-registry: + +Gitlab registry +~~~~~~~~~~~~~~~ + +You must have a gitlab account and generate a personal access token with at least +`write` access to the `gitlab registry +`_. + +.. _publish-image: + +Publish image +~~~~~~~~~~~~~ + +You must first login your docker to the swh :ref:`gitlab registry ` and +then push the image: + +.. code:: + + $ docker login # login to the gitlab registry (prompted for personal access token) + passwd: ********** + $ docker push $FULL_IMAGE + $ docker push $FULL_IMAGE_LATEST + +Do not forget to :ref:`commit the changes and tag `. + +Finally, let's :ref:`update the impacted chart ` with the new +docker image version. + +.. _commit-changes-and-tag: + +Commit and tag +~~~~~~~~~~~~~~ + +Commit and tag the changes. + +.. _update-impacted-chart: + +Update impacted chart +~~~~~~~~~~~~~~~~~~~~~ + +In the `swh-chart`_ repository, update the `values file +`_ +with the corresponding new changed version. + +:ref:`ArgoCD ` will be in charge of deploying the changes in a rolling +upgrade fashion. + +.. _swh-apps: https://gitlab.softwareheritage.org/infra/swh-apps/ +.. _swh-chart: https://gitlab.softwareheritage.org/infra/ci-cd/swh-charts