Page MenuHomeSoftware Heritage

Prepare the tests to run in Jenkins
ClosedPublic

Authored by lunar on Oct 5 2022, 6:10 PM.

Details

Summary
  • Add missing requirement needed by the tests
  • Add build system to pyproject
  • Match SERVICES with current stack description
  • Stop waiting for monitoring services
  • Restart graph replayer automatically
  • Control Docker more robustly
  • Move API URL to an environment variable
  • Allow connections to swh-web from anywhere

To focus on the most important set of changes:

Previously, Docker was controlled through custom shell commands issued
via pytest-testinfra. Multiple loops where used to wait for containers
to converge to the desired state. Given the test requires quite some
resources in terms of RAM, CPU, and bandwith, its execution can take
between 20 minutes and a full hour. Therefore, it would be better
to avoid having to edit the code to fiddle with timeouts and loop
counts.

One of the way we used to wait for completion was looking at the
Docker service logs for specific messages. Sadly, pytest-testinfra
does not really provide a way to monitor the output of a command as it
runs.

All-in-all, it felt easier to replace the custom shell commands by a
library offering a Python API to control Docker. While dockerpy is the
official library, its future maintainance by Docker has been in flux
(see: https://github.com/docker/docker-py/pull/2989) and more
importantly, as it speaks directly to the Docker socket, it does not
support anything like docker stack.

Meanwhile, the python_on_whales has been featured on the official
Docker blog (https://www.docker.com/blog/guest-post-calling-the-docker-cli-from-python-with-python-on-whales/)
and implements as much as possible of Docker command line interface (by
shelling out).

With it, it was possible to:

  • Replace all command lines with Python calls.
  • Wait for containers to shut down on teardown before removing volumes.
  • Wait for log messages to appear by blocking on the service output.
  • Wait for Docker readyness when we start the stack and scale services.

Some refactoring and naming improvements have been made along the way.

Because we still would like to timeout in case of a problem,
pytest-timeout has been added to the requirements with a default
timeout of 30 minutes.

Diff Detail

Repository
rCDFP Deployment tools for hosting a mirror
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

lunar requested review of this revision.Oct 5 2022, 6:10 PM
vlorentz added inline comments.
pyproject.toml
23 ↗(On Diff #31174)

Where does the dependency on tree_sitter come from?

pyproject.toml
23 ↗(On Diff #31174)

Cargo-culted from swh-search. From your comment I guess it can be removed?

Adjust wait_services_status() for a race condition

pyproject.toml
23 ↗(On Diff #31174)

ah, yes. swh-search uses it because it needs to compile a parser while building.

douardda added inline comments.
mirror.yml
257–258

do you know why you had to remove this restart_policy here? (I don't really remember why I had to put it there, but I vaguely remember that was a bit painful to figure this solution... anyway, should have added a comment back then)

tests/test_graph_replayer.py
23 ↗(On Diff #31185)

why get rid of these services (grafana and prometheus related ones) here?

349–350 ↗(On Diff #31185)

well this should not have been there in the first place, so I guess we cant get rid of it :-)

  • Remove the uneeded tree_sitter from build system dependencies.
  • Remove some leftover breakpoints.
lunar added inline comments.
mirror.yml
257–258

I had explained the message of the commit modifying exactly this bit:

Restart graph replayer automatically

One idea of the test is to see how replayers behave when run
concurrently. In order to make sure this happens, we need them to keep
restarting once they are done. Otherwise, there is a possibility that
the first instance will be done before we can start another one.

tests/test_graph_replayer.py
23 ↗(On Diff #31185)

The monitoring services are not exercised by the tests. There is no need to wait for them (and thus to actually require a swarm node allowing their execution), hence the change. Have I overlooked something hen forming that assumption?

Add two more commits required for predictable runs in Jenkins:

  • Make image build script optionally record build tag on disk
  • Add an assertion rather than an unused global for SWH_IMAGE_TAG

I have missed updating tests/README.md which currently mentions testinfra. I should make the change if the rest of these changes are accepted.

with the same idea as the inline comment, it would be better to rename the network like 'swh-mirror' rather than 'swhtest-mirror'.

tests/test_graph_replayer.py
23 ↗(On Diff #31185)

sorry for the late response.

The idea of this repo is mainly to have a complete and easy to use mirror deployment stack. So the monitoring part is included.

From there, we want this stack to be tested. But the fact there is no validation (yet) for the monitoring part in the test does not mean we do not want the monitoring part included.

Maybe a TODO/Task to keep track of the missing tests for the monitoring part, but I'd like it to stay in the mirror deployment stack for now.

This revision now requires changes to proceed.Oct 19 2022, 11:00 AM
lunar marked 2 inline comments as done.
  • Rename swhtest-mirror network to swh-mirror
  • Re-add monitoring nodes to the required services to run the tests
  • Add missing label to troubleshooting section

with the same idea as the inline comment, it would be better to rename the network like 'swh-mirror' rather than 'swhtest-mirror'.

Done.

This revision is now accepted and ready to land.Nov 2 2022, 5:37 PM