Details

Reviewers

Group Reviewers

Commits

rCDFP56b741752838: Add an assertion rather than an unused global for SWH_IMAGE_TAG
rCDFP0e0edaf3286e: Make image build script optionally record build tag on disk
rCDFPb55aa61c4eda: Allow connections to swh-web from anywhere
rCDFPf2386b40ed45: Move API URL to an environment variable
rCDFPc062adbaa990: Control Docker more robustly
rCDFP1a1433f30dc1: Match SERVICES with current stack description
rCDFPadc6295fe5dd: Restart graph replayer automatically
rCDFP60088d83f19e: Add missing label to troubleshooting section
rCDFPc3b65e494f68: Add build system to pyproject
rCDFPa023d7d37d6c: Add missing requirement needed by the tests
rCDFPadf7bdc34834: Rename swhtest-mirror network to swh-mirror

Summary

Add missing requirement needed by the tests
Add build system to pyproject
Match SERVICES with current stack description
Stop waiting for monitoring services
Restart graph replayer automatically
Control Docker more robustly
Move API URL to an environment variable
Allow connections to swh-web from anywhere

To focus on the most important set of changes:

Previously, Docker was controlled through custom shell commands issued
via pytest-testinfra. Multiple loops where used to wait for containers
to converge to the desired state. Given the test requires quite some
resources in terms of RAM, CPU, and bandwith, its execution can take
between 20 minutes and a full hour. Therefore, it would be better
to avoid having to edit the code to fiddle with timeouts and loop
counts.

One of the way we used to wait for completion was looking at the
Docker service logs for specific messages. Sadly, pytest-testinfra
does not really provide a way to monitor the output of a command as it
runs.

All-in-all, it felt easier to replace the custom shell commands by a
library offering a Python API to control Docker. While dockerpy is the
official library, its future maintainance by Docker has been in flux
(see: https://github.com/docker/docker-py/pull/2989) and more
importantly, as it speaks directly to the Docker socket, it does not
support anything like docker stack.

Meanwhile, the python_on_whales has been featured on the official
Docker blog (https://www.docker.com/blog/guest-post-calling-the-docker-cli-from-python-with-python-on-whales/)
and implements as much as possible of Docker command line interface (by
shelling out).

With it, it was possible to:

Replace all command lines with Python calls.
Wait for containers to shut down on teardown before removing volumes.
Wait for log messages to appear by blocking on the service output.
Wait for Docker readyness when we start the stack and scale services.

Some refactoring and naming improvements have been made along the way.

Because we still would like to timeout in case of a problem,
pytest-timeout has been added to the requirements with a default
timeout of 30 minutes.

Diff Detail

Repository

rCDFP Deployment tools for hosting a mirror

Lint

Automatic diff as part of commit; lint not applicable.

Unit

Automatic diff as part of commit; unit tests not applicable.

Event Timeline

lunar created this revision.Oct 5 2022, 6:10 PM

Herald added a reviewer: Reviewers. · View Herald TranscriptOct 5 2022, 6:10 PM

lunar requested review of this revision.Oct 5 2022, 6:10 PM

Harbormaster completed remote builds in B32132: Diff 31174.Oct 5 2022, 6:10 PM

vlorentz added a subscriber: vlorentz.Oct 6 2022, 12:03 PM

vlorentz added inline comments.

pyproject.toml
23 ↗	(On Diff #31174)	Where does the dependency on tree_sitter come from?

lunar added inline comments.Oct 6 2022, 12:08 PM

pyproject.toml
23 ↗	(On Diff #31174)	Cargo-culted from `swh-search`. From your comment I guess it can be removed?

Adjust wait_services_status() for a race condition

Harbormaster completed remote builds in B32143: Diff 31185.Oct 6 2022, 6:05 PM

vlorentz added inline comments.Oct 11 2022, 10:33 AM

pyproject.toml
23 ↗	(On Diff #31174)	ah, yes. swh-search uses it because it needs to compile a parser while building.

douardda added a subscriber: douardda.Oct 11 2022, 10:43 AM

douardda added inline comments.

mirror.yml
257–258	do you know why you had to remove this restart_policy here? (I don't really remember why I had to put it there, but I vaguely remember that was a bit painful to figure this solution... anyway, should have added a comment back then)
tests/test_graph_replayer.py
23 ↗	(On Diff #31185)	why get rid of these services (grafana and prometheus related ones) here?
349–350 ↗	(On Diff #31185)	well this should not have been there in the first place, so I guess we cant get rid of it :-)

Remove the uneeded tree_sitter from build system dependencies.
Remove some leftover breakpoints.

Harbormaster completed remote builds in B32236: Diff 31285.Oct 12 2022, 8:59 AM

lunar marked 2 inline comments as done.Oct 12 2022, 9:02 AM

lunar added inline comments.

mirror.yml
257–258	I had explained the message of the commit modifying exactly this bit: Restart graph replayer automatically One idea of the test is to see how replayers behave when run concurrently. In order to make sure this happens, we need them to keep restarting once they are done. Otherwise, there is a possibility that the first instance will be done before we can start another one.
tests/test_graph_replayer.py
23 ↗	(On Diff #31185)	The monitoring services are not exercised by the tests. There is no need to wait for them (and thus to actually require a swarm node allowing their execution), hence the change. Have I overlooked something hen forming that assumption?

Add two more commits required for predictable runs in Jenkins:

Make image build script optionally record build tag on disk
Add an assertion rather than an unused global for SWH_IMAGE_TAG

Harbormaster completed remote builds in B32264: Diff 31311.Oct 12 2022, 5:31 PM

lunar mentioned this in D8671: Add a job running swh-mirror tests.Oct 13 2022, 11:13 AM

I have missed updating tests/README.md which currently mentions testinfra. I should make the change if the rest of these changes are accepted.

lunar added a child revision: D8671: Add a job running swh-mirror tests.Oct 13 2022, 5:50 PM

with the same idea as the inline comment, it would be better to rename the network like 'swh-mirror' rather than 'swhtest-mirror'.

tests/test_graph_replayer.py
23 ↗	(On Diff #31185)	sorry for the late response. The idea of this repo is mainly to have a complete and easy to use mirror deployment stack. So the monitoring part is included. From there, we want this stack to be tested. But the fact there is no validation (yet) for the monitoring part in the test does not mean we do not want the monitoring part included. Maybe a TODO/Task to keep track of the missing tests for the monitoring part, but I'd like it to stay in the mirror deployment stack for now.

This revision now requires changes to proceed.Oct 19 2022, 11:00 AM

Rename swhtest-mirror network to swh-mirror
Re-add monitoring nodes to the required services to run the tests
Add missing label to troubleshooting section

Harbormaster completed remote builds in B32523: Diff 31569.Oct 24 2022, 2:38 PM

In D8634#227186, @douardda wrote:

with the same idea as the inline comment, it would be better to rename the network like 'swh-mirror' rather than 'swhtest-mirror'.

Done.

douardda accepted this revision.Nov 2 2022, 5:37 PM

This revision is now accepted and ready to land.Nov 2 2022, 5:37 PM

Closed by commit rCDFPadf7bdc34834: Rename swhtest-mirror network to swh-mirror (authored by lunar). · Explain WhyNov 7 2022, 1:38 PM

This revision was automatically updated to reflect the committed changes.

lunar added a commit: rCDFPadf7bdc34834: Rename swhtest-mirror network to swh-mirror.

lunar added a commit: rCDFPc3b65e494f68: Add build system to pyproject.

lunar added a commit: rCDFPa023d7d37d6c: Add missing requirement needed by the tests.

lunar added a commit: rCDFP60088d83f19e: Add missing label to troubleshooting section.

lunar added a commit: rCDFP1a1433f30dc1: Match SERVICES with current stack description.

lunar added a commit: rCDFPadc6295fe5dd: Restart graph replayer automatically.

lunar added a commit: rCDFPc062adbaa990: Control Docker more robustly.