diff --git a/docs/contributing/code-review.rst b/docs/contributing/code-review.rst new file mode 100644 index 0000000..c428d62 --- /dev/null +++ b/docs/contributing/code-review.rst @@ -0,0 +1,51 @@ +.. _code-review: + +Code Review +=========== + +This page documents code review practices used for [[Software Heritage]] development. + +Guidelines +---------- + +Please adhere to the following guidelines to perform and obtain code reviews +(CRs) in the context of Software Heritage development: + +1. **CRs are strongly recommended** for any non-trivial code change, + but not mandatory (nor enforced at the VCS level). +2. The CR `workflow `_ is implemented using + Phabricator/Differential. +3. Explicitly **suggest reviewer(s)** when submitting new CR requests: + either the most knowledgeable person(s) for the target code or the general + `reviewers `_ + (which is the `default `_). +4. **Review anything you want**: no matter the suggested reviewer(s), + feel free to review any outstanding CR. +5. **One LGTM is enough**: feel free to approve any outstanding CR. +6. **Review every day**: CRs should be timely as fellow developers + will wait for them. + To make CRs sustainable each developer should strive to dedicate + a fixed minimum amount of CR time every (work) day. + +For more detailed suggestions (and much more) on the motivational +and practical aspects of code reviews see Good reads below. + +Good reads +---------- + +Good reads on various angles of code review: + +* `Best practices `_ (Palantir) ← comprehensive and recommended read, especially if you're short on time +* `Best practices `_ (Thoughtbot) +* `Best practices `_ (Smart Bear) +* `Review checklist `_ (Code Project) +* `Motivation: code quality `_ (Coding Horror) +* `Motivation: team culture `_ (Google & FullStory) +* `Motivation: humanizing peer reviews `_ (Wiegers) +* `Motivation: sharing knowledge `_ (Atlassian) + +See also +-------- + +* :ref:`phabricator-arcanist`_ +* :ref:`coding-guidelines`_ diff --git a/docs/contributing/index.rst b/docs/contributing/index.rst new file mode 100644 index 0000000..e0475be --- /dev/null +++ b/docs/contributing/index.rst @@ -0,0 +1,13 @@ +Contributing +============ + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :titlesonly: + :hidden: + + phabricator + code-review + python-style-guide + sphinx diff --git a/docs/contributing/phabricator.rst b/docs/contributing/phabricator.rst new file mode 100644 index 0000000..313bf0b --- /dev/null +++ b/docs/contributing/phabricator.rst @@ -0,0 +1,289 @@ +.. highlight:: bash + +.. _patch-submission: + +Submitting patches +================== + +`Phabricator`_ is the tool that Software Heritage uses as its +coding/collaboration forge. + +Software Heritage's Phabricator instance can be found at +https://forge.softwareheritage.org/ + +.. _Phabricator: http://phabricator.org/ + +Code Review in Phabricator +-------------------------- + +We use the Differential application of Phabricator to perform +:ref:`code reviews ` in the context of Software Heritage. + +* we use Git and ``history.immutable=true`` + (but beware as that is partly a Phabricator misnomer, read on) +* when code reviews are required, developers will be allowed to push + directly to master once an accepted Differential diff exists + +Configuration ++++++++++++++ + +Arcanist configuration +^^^^^^^^^^^^^^^^^^^^^^ + +Authentication +~~~~~~~~~~~~~~ + +First, you should install Arcanist and authenticate it to Phabricator:: + + sudo apt-get install arcanist + arc set-config default https://forge.softwareheritage.org/ + arc install-certificate + +arc will prompt you to login into Phabricator via web +(which will ask your personal Phabricator credentials). +You will then have to copy paste the API token from the web page to arc, +and hit Enter to complete the certificate installation. + +Immutability +~~~~~~~~~~~~ + +When using git, Arcanist by default mess with the local history, +rewriting commits at the time of first submission. +To avoid that we use so called `history immutability`_ + +.. _history immutability: https://secure.phabricator.com/book/phabricator/article/arcanist_new_project/#history-mutability-git + +To that end, you shall configure your ``arc`` accordingly:: + + arc set-config history.immutable true + +Note that this does **not** mean that you are forbidden to rewrite +your local branches (e.g., with ``git rebase``). +Quite the contrary: you are encouraged to locally rewrite branches +before pushing to ensure that commits are logically separated +and your commit history easy to bisect. +The above setting just means that *arc* will not rewrite commit +history under your nose. + +Enabling ``git push`` to our forge +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The way we've configured our review setup for continuous integration +needs you to configure git to allow pushes to our forge. +There's two ways you can do this : setting a ssh key to push over ssh, +or setting a specific password for git pushes over https. + +SSH key for pushes +~~~~~~~~~~~~~~~~~~ + +In your forge User settings page (On the top right, click on your avatar, +then click *Settings*), you have access to a *Authentication* > +*SSH Public Keys* section (Direct link: +``hxxps://forge.softwareheritage.org/settings/user//page/ssh/``). +You then have the option to upload a SSH public key, +which will authenticate your pushes. + +You then need to configure ssh/git to use that key pair, +for instance by editing the ``~/.ssh/config`` file. + +Finally, you should configure git to push over ssh when pushing to +https://forge.softwareheritage.org, by running the following command:: + + git config --global url.git@forge.softwareheritage.org:.pushInsteadOf https://forge.softwareheritage.org + +This lets git know that it should use ``git@forge.softwareheritage.org:`` +as a base url when pushing repositories cloned from +forge.softwareheritage.org over https. + +VCS password for pushes +~~~~~~~~~~~~~~~~~~~~~~~ + +If you're not comfortable setting up SSH to upload your changes, +you have the option of setting a VCS password. +This password, *separate from your account password*, +allows Phabricator to authenticate your uploads over HTTPS. + +In your forge User settings page (On the top right, click on your avatar, +then click *Settings*), you need to use the *Authentication* > *VCS Password* +section to set your VCS password (Direct link: +``hxxps://forge.softwareheritage.org/settings/user//page/vcspassword/``). + +If you still get a 403 error on push, this means you need +a forge administrator to enable HTTPS pushes for the repository +(which wasn't done by default in historical repositories). +Please drop by on IRC and let us know! + +Workflow +++++++++ + +* work in a feature branch: ``git checkout -b my-feat`` +* initial review request: hack/commit/hack/commit ; + ``arc diff origin/master`` +* react to change requests: hack/commit/hack/commit ; + ``arc diff --update Dxx origin/master`` +* landing change: ``git checkout master ; git merge my-feat ; git push`` + +Starting a new feature and submit it for review +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Use a **one branch per feature** workflow, with well-separated +**logical commits** (:ref:`following those conventions `). +Please open one diff per logical commit to keep the diff size to a minimum. + +.. code-block:: + + git checkout -b my-shiny-feature + ... hack hack hack ... + git commit -m 'architecture skeleton for my-shiny-feature' + ... hack hack hack ... + git commit -m 'my-shiny-feature: implement module foo' + ... etc ... + +Please, follow the +To **submit your code for review** the first time:: + + arc diff origin/master + +arc will prompt for a **code review message**. Provide the following information: + +* first line: *short description* of the overall work + (i.e., the feature you're working on). + This will become the title of the review +* *Summary* field (optional): *long description* of the overall work; + the field can continue in subsequent lines, up to the next field. + This will become the "Summary" section of the review +* *Test Plan* field (optional): write here if something special is needed + to test your change +* *Reviewers* field (optional): the (Phabricator) name(s) of + desired reviewers. + If you don't specify one (recommended) the default reviewers will be chosen +* *Subscribers* field (optional): the (Phabricator) name(s) of people that + will be notified about changes to this review request. + In most cases it should be left empty + +For example:: + + mercurial loader + + Summary: first stab at a mercurial loader (T329) + + The implementation follows the plan detailed in F2F discussion with @foo. + + Performances seem decent enough for a first trial (XXX seconds for YYY repository + that contains ZZZ patches). + + Test plan: + + Reviewers: + + Subscribers: foo + +After completing the message arc will submit the review request +and tell you its number and URL:: + + [...] + Created a new Differential revision: + Revision URI: https://forge.softwareheritage.org/Dxx + +.. _arc-update: + +Updating your branch to reflect requested changes +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Your feature might get accepted as is, YAY! +Or, reviewers might request changes; no big deal! + +Use the Differential web UI to follow-up to received comments, if needed. + +To implement requested changes in the code, hack on your branch as usual by: + +* adding new commits, and/or +* rewriting old commits with git rebase (to preserve a nice, easy to bisect history) +* pulling on master and rebasing your branch against it if meanwhile someone + landed commits on master: + +.. code-block:: + + git checkout master + git pull + git checkout my-shiny-feature + git rebase master + + +When you're ready to **update your review request**:: + + arc diff --update Dxx HEAD~ + +Arc will prompt you for a message: describe what you've changed +w.r.t. the previous review request, free form. +Your message will become the changelog entry in Differential +for this new version of the diff. + +Differential only care about the code diff, and not about the commits +or their order. +Therefore each "update" can be a completely different series of commits, +possibly rewritten from the previous submission. + +Dependencies between diffs +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Note that you can manage diff dependencies within the same module +with the following keyword in the diff description:: + + Depends on Dxx + +That allows to keep a logical view in your diff. +It's not strictly necessary (because the tooling now deals with it properly) +but it might help reviewers or yourself to do so. + +Landing your change onto master +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Once your change has been approved in Differential, +you will be able to land it onto the master branch. + +Before doing so, you're encouraged to **clean up your git commit history**, +reordering/splitting/merging commits as needed to have separate +logical commits and an easy to bisect history. +Update the diff :ref:`following the prior section ` +(It'd be good to let the ci build finish to make sure everything is still green). + +Once you're happy you can **push to origin/master** directly, e.g.:: + + git checkout master + git merge --ff-only my-shiny-feature + git push + +``--ff-only`` is optional, and makes sure you don't unintentionally +create a merge commit. + +Optionally you can then delete your local feature branch:: + + git branch -d my-shiny-feature + +Reviewing locally / landing someone else's changes +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +You can do local reviews of code with arc patch:: + + arc patch Dxyz + +This will create a branch **arcpatch-Dxyz** containing the changes +on your local checkout. + +You can then merge those changes upstream with:: + + git checkout master + git merge --ff arcpatch-Dxyz + git push origin master + +or, alternatively:: + + arc land --squash + + +See also +-------- + +* :ref:`code-review` for guidelines on how code is reviewed + when developing for Software Heritage diff --git a/docs/contributing/python-style-guide.rst b/docs/contributing/python-style-guide.rst new file mode 100644 index 0000000..7c3180d --- /dev/null +++ b/docs/contributing/python-style-guide.rst @@ -0,0 +1,67 @@ +.. _python-style-guide: + +Python style guide +================== + +Coding style and best practices for writing Python code for [[Software Heritage]]. + +General rules +------------- + +* As a general rule, follow the + `Google Python Style Guide `_. +* Target **Python 3**. Do not care about backward compatibility with Python 2. + +Black ++++++ + +As of April 2020, we use `Black `_ +as automated code formatter for all Software Heritage Python code. +CI, tox, and other linting tools are configured to fail +if code is not formatted as black would. + +Note that, as part of this, *maximum line length is 88 characters*, +rather than the default of 79. + +Specific rules +-------------- + +As supplement/overrides to the above general rules, +follow the additional recommendations below. + +Lint +++++ + +* Make sure your code is `flake8 `_ + and `Black `_ clean. + +Tests ++++++ + +* use ``pytest`` as test runner +* put ``tests/`` dir down deep in the module hierarchy, near to the code being tested +* naming conventions: +** ``tests/test_mymodule.py`` +** ``class TestMyEntity(unittest.TestCase)`` +** ``def behavior(self):`` +*** do *not* prepend ``test_`` to all test methods; + use nose's ``@istest`` decorator instead + +Classes ++++++++ + +* Since we target Python 3, there is no need to + `inherit from 'object' explicitly `_ + +Docstrings +--------- + +* docstrings should produce a nice API documentation when run through + `Sphinx `_ +* docstrings should use the + `Napoleon style `_ + (Google variant) to document arguments, return values, etc. +** see also: a `comprehensive style example `_ +** see also: :ref:`sphinx-gotchas` diff --git a/docs/contributing/sphinx.rst b/docs/contributing/sphinx.rst new file mode 100644 index 0000000..e013f59 --- /dev/null +++ b/docs/contributing/sphinx.rst @@ -0,0 +1,237 @@ +.. _sphinx-gotchas: + +Sphinx gotchas +============== + +Here is a list of common gotchas when formatting Python docstrings for [http://www.sphinx-doc.org/en/stable/ Sphinx] and the [http://www.sphinx-doc.org/en/stable/ext/napoleon.html Napoleon] style. + +Sphinx +------ + +Lists ++++++ + +All sorts of `lists `_ +require an empty line before the first bullet and after the last one, +to be properly interpreted as list. +No indentation is required for list elements w.r.t. surrounding text, +and line continuations should be indented like the first character +after the bullet + +Bad:: + + this is a bad example that will not be interpreted as a list + preceding text + - foo + - bar + - baz + following text + +Good:: + + this is some text preceding the list + + - foo + - bar + - baz + - this is a rather long-ish paragraph inserted in the list + with line continuation + - qux + + this is some text following the list + +Bad:: + + - foo + - nested lists also requires empty lines, but they are missing here + - inner list 1 + - inner list 2 + - outer list continues here + +Good:: + + surrounding text + + - foo + - nested lists also requires empty lines + + - inner list 1 + - inner list 2 + + - outer list continues here + + surrounding text + +Verbatim source code +++++++++++++++++++++ + +Verbatim `code blocks `_, +e.g., for code examples, requires double colon at the end of a line, +then an empty line, and then the code block itself, indented: + +Bad:: + + This does not work as there is a single column and no empty line before code: + def foo(bar, baz): + qux = bar + baz + + return qux + +Good:: +
+   a nice example of python code follows::
+
+       def foo(bar, baz):
+           qux = bar + baz
+
+           return qux
+
+   here we can restart text flow
+
+*Inline code samples* use double backquotes, and not single ones.
+
+Bad::
+
+   you have to instantiate the method `def foo(bar): pass`
+   in order to use this abstract class
+
+Good:
+
+   you have to instantiate the method ``def foo(bar): pass``
+   in order to use this abstract class
+
+=== ``**kwargs``, ``**args`` ===
+
+`Asterisks needs to be escaped `_
+to avoid capture by emphasis markup.
+In case of multiple adjacent asterisks, escaping the first one is enough.
+
+Bad::
+
+   additional **kwargs are copied in the returned dictionary
+
+Good:
+
+   additional \**kwargs are copied in the returned dictionary
+
+Code cross-references
++++++++++++++++++++++
+
+Backquotes are not enough to cross-reference a Python entity
+(class, function, module, etc.); you need to use
+`Sphinx domains `_ for that,
+and in particular the `Python domain `_
+
+Bad::
+
+   see the `do_something` function and the `swh.useless` module
+   for more information
+
+Good::
+
+   see the :func:`do_something` function and the :mod:`swh.useless` module
+   for more information
+
+Good::
+
+   you can avoid a long, fully-qualified anchor setting an
+   :func:`explicit label ` for a link
+
+See also: the `list of Python roles `_
+that you can use to cross-reference Python objects.
+Note that you can (and should) omit the :py: prefix,
+as Python is the default domain.
+
+Note also that when building Sphinx documentation
+for individual Software Heritage modules in isolation,
+cross-references to other modules will *not* be resolvable.
+But they will be resolvable when building the unified documentation
+from ``swh-docs``
+
+Napoleon
+--------
+
+Docstring sections
+++++++++++++++++++
+
+See the `list of docstring sections `_
+supported by Napoleon.
+Everything else will *not* be typeset with a dedicated heading,
+you will have to do so explicitly using reStructuredText markup.
+
+Args
+++++
+
+Entries in Args section do *not* start with bullets, but just with argument names (as any other Napoleon section).
+Continuation lines should be indented.
+
+Bad::
+
+   Args:
+       - foo (int): first argument
+       - bar: second argument
+       - baz (bool): third argument
+
+Good::
+
+   Args:
+       foo (int): first argument
+       bar: second argument, which happen to have a fairly
+           long description of what it does
+       baz (bool): third argument
+
+Returns
++++++++
+
+In Returns section you need to use ":" carefully as, if present, it will be interpreted as a separator between return type and description. Also, the description of return value should not start on the same line of "Returns:", but on the subsequent one, indented.
+
+Bad::
+
+   Returns:
+       this does not work (colon will be interpreted as type/desc separator), a dict with keys:
+
+       - foo
+       - bar
+
+Good::
+
+   Returns:
+       this works (there is no colon) a dict with keys
+
+       - foo
+       - bar
+
+Good::
+
+   Returns:
+       dict: this works again (*first* colon identifies the type) a dict with keys:
+
+       - foo
+       - bar
+
+Bad::
+
+   Returns: this is not good either, you need to start a paragraph
+
+Raises
+++++++
+
+You need a ":" separator between exception names and their description.
+
+Bad::
+
+   Raises:
+       ValueError if you botched it
+       RuntimeError if we botched it
+
+
+Good::
+
+   Raises:
+       ValueError: if you botched it
+       RuntimeError: if we botched it
+
+See also
+--------
+
+* :ref:`python-style-guide`
diff --git a/docs/index.rst b/docs/index.rst
index a11a245..1deaf0f 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -1,184 +1,193 @@
 .. _swh-docs:
 
 
 Software Heritage - Development Documentation
 =============================================
 
 
 Getting started
 ---------------
 
 * :ref:`getting-started` → deploy a local copy of the Software Heritage
   software stack in less than 5 minutes, or
 * :ref:`developer-setup` → get a working development setup that allows to hack
   on the Software Heritage software stack
 
+Contributing
+------------
+
+* :ref:`patch-submission` → learn how to submit your patches to the
+  Software Heritage codebase
+* :ref:`code-review` → rules and guidelines to review code in
+  Software Heritage
+* :ref:`python-style-guide` → how to format the Python code you write
 
 Architecture
 ------------
 
 * :ref:`architecture` → get a glimpse of the Software Heritage software
   architecture
 * :ref:`mirror` → learn what a Software Heritage mirror is and how to set up
   one
 
 
 Data Model and Specifications
 -----------------------------
 
 * :ref:`persistent-identifiers` Specifications of the SoftWare Heritage persistent IDentifiers (SWHID).
 * :ref:`data-model` Documentation of the main |swh| archive data model.
 * :ref:`journal-specs` Documentation of the Kafka journal of the |swh| archive.
 
 Components
 ----------
 
 Here is brief overview of the most relevant software components in the Software
 Heritage stack. Each component name is linked to the development documentation
 of the corresponding Python module.
 
 :ref:`swh.core `
     low-level utilities and helpers used by almost all other modules in the
     stack
 
 :ref:`swh.dataset `
     public datasets and periodic data dumps of the archive released by Software
     Heritage
 
 :ref:`swh.deposit `
     push-based deposit of software artifacts to the archive
 
 swh.docs
     developer documentation (used to generate this doc you are reading)
 
 :ref:`swh.fuse `
     Virtual file system to browse the Software Heritage archive, based on
     `FUSE `_
 
 :ref:`swh.graph `
     Fast, compressed, in-memory representation of the archive, with tooling to
     generate and query it.
 
 :ref:`swh.indexer `
     tools and workers used to crawl the content of the archive and extract
     derived information from any artifact stored in it
 
 :ref:`swh.journal `
     persistent logger of changes to the archive, with publish-subscribe support
 
 :ref:`swh.lister `
     collection of listers for all sorts of source code hosting and distribution
     places (forges, distributions, package managers, etc.)
 
 :ref:`swh.loader-core `
     low-level loading utilities and helpers used by all other loaders
 
 :ref:`swh.loader-git `
     loader for `Git `_ repositories
 
 :ref:`swh.loader-mercurial `
     loader for `Mercurial `_ repositories
 
 :ref:`swh.loader-svn `
     loader for `Subversion `_ repositories
 
 :ref:`swh.model `
     implementation of the :ref:`data-model` to archive source code artifacts
 
 :ref:`swh.objstorage `
     content-addressable object storage
 
 :ref:`swh.objstorage.replayer `
     Object storage replication tool
 
 :ref:`swh.scanner `
     source code scanner to analyze code bases and compare them with source code
     artifacts archived by Software Heritage
 
 :ref:`swh.scheduler `
     task manager for asynchronous/delayed tasks, used for recurrent (e.g.,
     listing a forge, loading new stuff from a Git repository) and one-off
     activities (e.g., loading a specific version of a source package)
 
 :ref:`swh.search `
     search engine for the archive
 
 :ref:`swh.storage `
     abstraction layer over the archive, allowing to access all stored source
     code artifacts as well as their metadata
 
 :ref:`swh.vault `
     implementation of the vault service, allowing to retrieve parts of the
     archive as self-contained bundles (e.g., individual releases, entire
     repository snapshots, etc.)
 
 :ref:`swh.web `
     Web application(s) to browse the archive, for both interactive (HTML UI)
     and mechanized (REST API) use
 
 :ref:`swh.web.client `
     Python client for :ref:`swh.web `
 
 
 Dependencies
 ------------
 
 The dependency relationships among the various modules are depicted below.
 
 .. _py-deps-swh:
 .. figure:: images/py-deps-swh.svg
    :width: 1024px
    :align: center
 
    Dependencies among top-level Python modules (click to zoom).
 
 
 Archive
 -------
 
 * :ref:`Archive ChangeLog `: notable changes to the archive
   over time
 
 
 Indices and tables
 ==================
 
 * :ref:`genindex`
 * :ref:`modindex`
 * `URLs index `_
 * :ref:`search`
 * :ref:`glossary`
 
 
 .. ensure sphinx does not complain about index files not being included
 
 .. toctree::
    :maxdepth: 2
    :caption: Contents:
    :titlesonly:
    :hidden:
 
    architecture
    getting-started
    developer-setup
+   contributing/index
    API documentation 
    swh.core 
    swh.dataset 
    swh.deposit 
    swh.fuse 
    swh.graph 
    swh.indexer 
    swh.journal 
    swh.lister 
    swh.loader 
    swh.model 
    swh.objstorage 
    swh.scanner 
    swh.scheduler 
    swh.search 
    swh.storage 
    swh.vault 
    swh.web 
    swh.web.client 
    testing
    issue-debugging-monitoring