diff --git a/docs/index.rst b/docs/index.rst index 021dbac..a11a245 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -1,183 +1,184 @@ .. _swh-docs: Software Heritage - Development Documentation ============================================= Getting started --------------- * :ref:`getting-started` → deploy a local copy of the Software Heritage software stack in less than 5 minutes, or * :ref:`developer-setup` → get a working development setup that allows to hack on the Software Heritage software stack Architecture ------------ * :ref:`architecture` → get a glimpse of the Software Heritage software architecture * :ref:`mirror` → learn what a Software Heritage mirror is and how to set up one Data Model and Specifications ----------------------------- * :ref:`persistent-identifiers` Specifications of the SoftWare Heritage persistent IDentifiers (SWHID). * :ref:`data-model` Documentation of the main |swh| archive data model. * :ref:`journal-specs` Documentation of the Kafka journal of the |swh| archive. Components ---------- Here is brief overview of the most relevant software components in the Software Heritage stack. Each component name is linked to the development documentation of the corresponding Python module. :ref:`swh.core ` low-level utilities and helpers used by almost all other modules in the stack :ref:`swh.dataset ` public datasets and periodic data dumps of the archive released by Software Heritage :ref:`swh.deposit ` push-based deposit of software artifacts to the archive swh.docs developer documentation (used to generate this doc you are reading) :ref:`swh.fuse ` Virtual file system to browse the Software Heritage archive, based on `FUSE `_ :ref:`swh.graph ` Fast, compressed, in-memory representation of the archive, with tooling to generate and query it. :ref:`swh.indexer ` tools and workers used to crawl the content of the archive and extract derived information from any artifact stored in it :ref:`swh.journal ` persistent logger of changes to the archive, with publish-subscribe support :ref:`swh.lister ` collection of listers for all sorts of source code hosting and distribution places (forges, distributions, package managers, etc.) :ref:`swh.loader-core ` low-level loading utilities and helpers used by all other loaders :ref:`swh.loader-git ` loader for `Git `_ repositories :ref:`swh.loader-mercurial ` loader for `Mercurial `_ repositories :ref:`swh.loader-svn ` loader for `Subversion `_ repositories :ref:`swh.model ` implementation of the :ref:`data-model` to archive source code artifacts :ref:`swh.objstorage ` content-addressable object storage :ref:`swh.objstorage.replayer ` Object storage replication tool :ref:`swh.scanner ` source code scanner to analyze code bases and compare them with source code artifacts archived by Software Heritage :ref:`swh.scheduler ` task manager for asynchronous/delayed tasks, used for recurrent (e.g., listing a forge, loading new stuff from a Git repository) and one-off activities (e.g., loading a specific version of a source package) :ref:`swh.search ` search engine for the archive :ref:`swh.storage ` abstraction layer over the archive, allowing to access all stored source code artifacts as well as their metadata :ref:`swh.vault ` implementation of the vault service, allowing to retrieve parts of the archive as self-contained bundles (e.g., individual releases, entire repository snapshots, etc.) :ref:`swh.web ` Web application(s) to browse the archive, for both interactive (HTML UI) and mechanized (REST API) use :ref:`swh.web.client ` Python client for :ref:`swh.web ` Dependencies ------------ The dependency relationships among the various modules are depicted below. .. _py-deps-swh: .. figure:: images/py-deps-swh.svg :width: 1024px :align: center Dependencies among top-level Python modules (click to zoom). Archive ------- * :ref:`Archive ChangeLog `: notable changes to the archive over time Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * `URLs index `_ * :ref:`search` * :ref:`glossary` .. ensure sphinx does not complain about index files not being included .. toctree:: :maxdepth: 2 :caption: Contents: :titlesonly: :hidden: architecture getting-started developer-setup API documentation swh.core swh.dataset swh.deposit swh.fuse swh.graph swh.indexer swh.journal swh.lister swh.loader swh.model swh.objstorage swh.scanner swh.scheduler swh.search swh.storage swh.vault swh.web swh.web.client + testing issue-debugging-monitoring diff --git a/docs/testing.rst b/docs/testing.rst new file mode 100644 index 0000000..d6ec83f --- /dev/null +++ b/docs/testing.rst @@ -0,0 +1,122 @@ + +Software testing guide +====================== + +Tools landscape +--------------- + +The testing framework we use is pytest_. It provides many facilities to write tests +efficiently. + +It is complemented by hypothesis_, a library for property-based testing in some of +our test suites. Its usage is a more advanced topic. + +We also use tox_, the automation framework, to run the +tests along with other quality checks in isolated environments. + +The main quality checking tools in use are: + +* mypy_, a static type checker. We gradually type-annotate all additions or refactorings + to the codebase; +* flake8_, a simple code style checker (aka linter); +* black_, an uncompromising code formatter. + +They are run automatically through ``tox`` or as ``pre-commit`` hooks in our Git repositories. + +The SWH testing framework +------------------------- + +This sections shows specifics about our usage of pytest and custom helpers. + +The pytest fixture system makes easy to write, share and plug setup and teardown code. +Fixtures are automatically loaded from the project ``conftest`` or ``pytest_plugin`` modules +into any test function by giving its name as argument. + +| Several pytest plugins have been defined across SWH projects: +| ``core``, ``core.db``, ``storage``, ``scheduler``, ``loader``, ``journal``. +| Many others, provided by the community are in use: +| ``flask``, ``django``, ``aiohttp``, ``postgresql``, ``mock``, ``requests-mock``, ``cov``, etc. + +We make of various mocking helpers: + +* ``unittest.mock``: ``Mock`` classes, ``patch`` function; +* ``mocker`` fixture from the ``mock`` plugin: adaptation of ``unittest.mock`` to the + fixture system, with a bonus ``spy`` function to audit without modifying objects; +* ``monkeypatching`` builtin fixture: modify object attributes or environment, with + automatic teardown. + +Other notable helpers include: + +* ``datadir``: to compute the path to the current test's ``data`` directory. + Available in the ``core`` plugin. +* ``requests_mock_datadir``: to load network responses from the datadir. + Available in the ``core`` plugin. +* ``swh_rpc_client``: for testing SWH RPC client and servers without incurring IO. + Available in the ``core`` plugin. +* ``postgresql_fact``: for testing database-backends interactions. + Available in the ``core.db`` plugin, adapted for performance from the ``postgresql`` plugin. +* ``click.testing.CliRunner``: to simplify testing of Click command-line interfaces. + It allows to test commands with some level of isolation from the execution environment. + https://click.palletsprojects.com/en/7.x/api/#click.testing.CliRunner + +Testing guidelines +------------------ + +General considerations +^^^^^^^^^^^^^^^^^^^^^^ + +We mostly do functional tests, and unit-testing when more ganularity is needed. By this, +we mean that we test each functionality and invariants of a component, without isolating +it from its dependencies systematically. The goal is to strike a balance between test +effectiveness and test maintenance. However, the most critical parts, like the storage +service, get more extensive unit-testing. + +Organize tests +^^^^^^^^^^^^^^ + +* In order to test a component (module, class), one must start by identifying its sets of + functionalities and invariants (or properties). +* One test may check multiples properties or commonly combined functionalities, if it can + fit in a short descriptive name. +* Organize tests in multiple modules, one for each aspect or subcomponent tested. + e.g.: initialization/configuration, db/backend, service API, utils, cli, etc. + +Test data +^^^^^^^^^ + +Each repository has its own ``tests`` directory, some such as listers even have one for +each lister type. + +* Put any non-trivial test data, used for setup or mocking, in (potentially compressed) + files in a ``data`` directory under the local testing directory. +* Use ``datadir`` fixtures to load them. + +Faking dependencies +^^^^^^^^^^^^^^^^^^^ + +* Make use of temporary directories for testing code relying on filesystem paths. +* Mock only already tested and expensive operations, typically IO with external services. +* Use ``monkeypatch`` fixture when updating environment or when mocking is overkill. +* Mock HTTP requests with ``requests_mock`` or ``requests_mock_datadir``. + +Final words +^^^^^^^^^^^ + +If testing is difficult, the tested design may need reconsideration. + +Other SWH resources on software quality +--------------------------------------- + +| https://wiki.softwareheritage.org/wiki/Python_style_guide +| https://wiki.softwareheritage.org/wiki/Git_style_guide +| https://wiki.softwareheritage.org/wiki/Arcanist_setup +| https://wiki.softwareheritage.org/wiki/Code_review +| https://wiki.softwareheritage.org/wiki/Jenkins +| https://wiki.softwareheritage.org/wiki/Testing_the_archive_features + +.. _pytest: https://pytest.org +.. _tox: https://tox.readthedocs.io +.. _hypothesis: https://hypothesis.readthedocs.io +.. _mypy: https://mypy.readthedocs.io +.. _flake8: https://flake8.pycqa.org +.. _black: https://black.readthedocs.io