diff --git a/docs/developer-setup.rst b/docs/developer-setup.rst
new file mode 100644
index 0000000..2762779
--- /dev/null
+++ b/docs/developer-setup.rst
@@ -0,0 +1,148 @@
+.. _developer-setup:
+
+Developer setup
+===============
+
+In this guide, we will set up a dual environment:
+
+- A virtual env in which all the |swh| packages will be installed in 'develop'
+ mode, this will allow you to navigate the source code, hack it, and run
+ locally the unit tests.
+
+- A docker 'cluster' built with docker-compose, which allows to easily run all
+ the components of the |swh| architecture. It is possible to run those docker
+ containers with your locally modified code for one or several |swh| packages.
+
+ Please read the `README file`_ in the swh-docker-dev repository for more
+ details on how to do this.
+
+.. _`README file`: https://forge.softwareheritage.org/source/swh-docker-dev/browse/master/README.md
+
+Checkout the source code
+------------------------
+
+Clone the |swh| environment repository::
+
+ ~$ git clone https://forge.softwareheritage.org/source/swh-environment.git
+ [...]
+ ~$ cd swh-environment
+ ~/swh-environment$
+
+Create a virtual env::
+
+ ~/swh-environment$ mkvirtualenv -p /usr/bin/python3 -a $PWD swh
+ [...]
+ (swh) ~/swh-environment$
+
+
+.. Note: using virtualenvwrapper_ is not mandatory here. You can use plain
+ virtualenvs, or any other venv management tool (pipenv_ or poetry_
+ for example). Using a tool such as virtualenvwrapper_ just makes life
+ easier...
+
+
+.. _virtualenvwrapper: https://virtualenvwrapper.readthedocs.io/
+.. _poetry: https://poetry.eustace.io/
+.. _pipenv: https://pipenv.readthedocs.io/
+
+
+Install all the swh packages (in develop mode)::
+
+ (swh) ~/swh-environment$ pip install $(./bin/pip-swh-packages --with-testing) \
+ tox pifpaf
+ [...]
+
+
+Setup the docker environment
+----------------------------
+
+Install docker-compose::
+
+ (swh) ~/swh-environment$ pip install docker-compose
+ [...]
+
+Make your life easier::
+
+ (swh) ~/swh-environment$ cat >>$VIRTUAL_ENV/bin/postactivate <`_ Git (meta)
-repository orchestrates the Git repositories of all Software Heritage modules.
-Clone it::
-
- git clone https://forge.softwareheritage.org/source/swh-environment.git
-
-then recursively clone all Python module repositories. For this step you will
-need the `mr `_ tool. Once you have installed
-``mr``, just run::
-
- cd swh-environment
- bin/update
-
-.. IMPORTANT::
-
- From now on this tutorial will assume that you **run commands listed below
- from within the swh-environment** directory.
-
-For periodic repository updates just re-run ``bin/update``.
-
-
-Step 1 --- install system dependencies
---------------------------------------
-
-You need to install three types of dependencies: some base packages, Node.js
-modules (for the web app), and Postgres (as storage backend).
-
-Package dependencies
-~~~~~~~~~~~~~~~~~~~~
-
-Software Heritage requires some dependencies that are usually packaged by your
-package manager. On Debian/Ubuntu-based distributions::
-
- sudo apt-get install curl ca-certificates
- curl https://deb.nodesource.com/setup_8.x | sudo bash
- curl https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
- sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
- sudo apt update
- sudo apt install python3 python3-venv libsvn-dev postgresql-10 nodejs \
- libsystemd-dev libpython3-dev dia postgresql-autodoc \
- postgresql-server-dev-all
-
-Postgres
-~~~~~~~~
-
-You need a running Postgres instance with administrator access (e.g., to create
-databases). On Debian/Ubuntu based distributions, the previous step
-(installation) should be enough.
-
-For other platforms and more details refer to the `PostgreSQL installation
-documentation
-`_.
-
-You also need to have access to a superuser account on the database. For that,
-the easiest way is to create a PostgreSQL account that has the same name as
-your username::
-
- sudo -u postgres createuser --createdb --superuser $USER
-
-You can check that this worked by doing, from your user (you should not be
-asked for a password)::
-
- psql postgres
-
-Node.js modules
-~~~~~~~~~~~~~~~
-
-If you want to run the web app to browser your local archive you will need some
-Node.js modules, in particular to pack web resources into a single compact
-file. To that end the following should suffice::
-
- cd swh-web
- npm install
- cd -
-
-You are now good to go with all needed dependencies on your development
-machine!
-
-
-Step 2 --- install Python packages in a virtualenv
---------------------------------------------------
-
-From now on you will need to work in a `virtualenv
-`_ containing the Python
-environment with all the Software Heritage modules and their dependencies. To
-that end you can do (once)::
-
- python3 -m venv .venv
-
-Then, activate the virtualenv (do this every time you start working on Software
-Heritage)::
-
- source .venv/bin/activate
-
-You can now install Software Heritage Python modules, their dependencies and
-the testing-related dependencies using::
-
- pip install $( bin/pip-swh-packages --with-testing )
-
-
-Step 3 --- set up storage
--------------------------
-
-Then you will need a local storage service that will archive and serve source
-code artifacts via a REST API. The Software Heritage storage layer comes in two
-parts: a content-addressable :term:`object storage` on your file system (for file
-contents) and a Postgres database (for the graph structure of the archive). See
-the :ref:`data-model` for more information. The storage layer is configured via
-a YAML configuration file, located at
-``~/.config/swh/storage/storage.yml``. Create it with a content like:
-
-.. code-block:: yaml
-
- storage:
- cls: local
- args:
- db: "dbname=softwareheritage-dev"
- objstorage:
- cls: pathslicing
- args:
- root: /srv/softwareheritage/objects/
- slicing: 0:2/2:4
-
-Make sure that the :term:`object storage` root exists on the filesystem and is writable
-to your user, e.g.::
-
- sudo mkdir -p /srv/softwareheritage/objects
- sudo chown "${USER}:" /srv/softwareheritage/objects
-
-You are done with :term:`object storage` setup! Let's setup the database::
-
- swh-db-init storage -d softwareheritage-dev
-
-``softwareheritage-dev`` is the name of the DB that will be created, it should
-match the ``db`` line in ``storage.yml``
-
-To check that you can successfully connect to the DB (you should not be asked
-for a password)::
-
- psql softwareheritage-dev
-
-You can now run the storage server like this::
-
- python3 -m swh.storage.api.server --host localhost --port 5002 ~/.config/swh/storage/storage.yml
-
-
-Step 4 --- ingest repositories
-------------------------------
-
-You are now ready to ingest your first repository into your local Software
-Heritage. For the sake of example, we will ingest a few Git repositories. The
-module in charge of ingesting Git repositories is the *Git loader*, Python
-module ``swh.loader.git``. Its configuration file is at
-``~/.config/swh/loader/git.yml``. Create it with a content like:
-
-.. code-block:: yaml
-
- storage:
- cls: remote
- args:
- url: http://localhost:5002
-It just informs the Git loader to use the storage server running on your
-machine. The ``url`` line should match the command line used to run the storage
-server.
+Using Docker
+++++++++++++
-You can now ingest Git repository on the command line using the command::
+The easiest way to run a Software Heritage instance is to use Docker and
+docker-compose. Please refer to the `docker-compose documentation
+`_ if you do not have a working docker setup.
- python3 -m swh.loader.git.loader --origin-url GIT_CLONE_URL
+Then::
-For instance, you can try ingesting the following repositories, in increasing
-size order (note that the last two might take a few hours to complete and will
-occupy several GB on both the Postgres DB and the object storage)::
+ git clone https://forge.softwareheritage.org/source/swh-docker-dev.git
+ cd swh-docker-dev
+ docker-compose up -d
- python3 -m swh.loader.git.loader --origin-url https://github.com/SoftwareHeritage/swh-storage.git
- python3 -m swh.loader.git.loader --origin-url https://github.com/hylang/hy.git
- python3 -m swh.loader.git.loader --origin-url https://github.com/ocaml/ocaml.git
+When all the containers are up and runnig, you have a running Software
+Heritage platform. You should open:
- # WARNING: next repo is big
- python3 -m swh.loader.git.loader --origin-url https://github.com/torvalds/linux.git
+- http://localhost:5080/ to navigate your (empty for now) SWH archive,
+- http://localhost:5080/rabbitmq to access the rabbitmq dashoard (guest/guest),
+- http://localhost:5080/prometheus to explore the platform's metrics,
-Congratulations, you have just archived your first source code repositories!
+All the internal APIs are also exposed:
-To re-archive the same repositories later on you can rerun the same commands:
-only *new* objects added since the previous visit will be archived upon the
-next one.
+- http://localhost:5080/scheduler
+- http://localhost:5080/storage
+- http://localhost:5080/indexer-storage
+- http://localhost:5080/deposit
+- http://localhost:5080/objstorage
+At this point, the simplest way to start indexing software is to use the 'Save
+Code Now' feature of the archive web interface:
-Step 5 --- browse the archive
------------------------------
+ http://localhost:5080/browse/origin/save/
-You can now setup a local web app to browse what you have locally archived. The
-web app uses the configuration file ``~/.config/swh/web/web.yml``. Create it
-and fill it with something like:
+Enjoy filling your hard drives!
-.. code-block:: yaml
- storage:
- cls: remote
- args:
- url: http://localhost:5002
+Hacking the archive
++++++++++++++++++++
-Nothing new here, the configuration just references the local storage server,
-which have been used before for repository ingestion.
+If you want to hack the code of the Software Heritage Archive, a bit more work
+will be required.
-You can now run the web app, and browse your local archive::
+The best way to have a development-friendly environment is to build a mixed
+docker/virtual env setup.
- make run-django-webpack-devserver
- xdg-open http://localhost:5004
+Such a setup is described in the :ref:`Perfect Developer Setup guide
+`.
-Note that the ``make`` target will first compile a `webpack
-`_ with various web assets and then launch the web app;
-for webpack compilation you will need the Node.js dependencies discussed above.
-As an initial tour of the web app, try searching for one of the repositories
-you have ingested (e.g., entering the ``hylang`` or ``ocaml`` keywords in the
-search bar). Clicking on the repository name you will be brought back in time,
-and you will be able to browse the source code and development history you have
-archived.
+Installing from sources (without a virtualenv)
+++++++++++++++++++++++++++++++++++++++++++++++
-Enjoy!
+If you prefer to run everything straight, you should refer to the :ref:`Manual
+Setup Guide `
diff --git a/docs/index.rst b/docs/index.rst
index 6095867..98c4829 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -1,138 +1,142 @@
.. _swh-docs:
Software Heritage - Development Documentation
=============================================
.. toctree::
:maxdepth: 2
:caption: Contents:
Getting started
---------------
-* :ref:`getting-started` ← start here to hack on the Software Heritage software
+* :ref:`getting-started` ← start here to get your own Software Heritage
+ platform running in less than 5 minutes, or
+* :ref:`developer-setup` ← here to hack on the Software Heritage software
stack
Architecture
------------
* :ref:`architecture` ← go there to have a glimpse on the Software Heritage software
architecture
Components
----------
Here is brief overview of the most relevant software components in the Software
Heritage stack. Each component name is linked to the development documentation
of the corresponding Python module.
:ref:`swh.archiver `
orchestrator in charge of guaranteeing that object storage content is
pristine and available in a sufficient amount of copies
:ref:`swh.core `
low-level utilities and helpers used by almost all other modules in the
stack
:ref:`swh.deposit `
push-based deposit of software artifacts to the archive
swh.docs
developer documentation (used to generate this doc you are reading)
:ref:`swh.indexer `
tools and workers used to crawl the content of the archive and extract
derived information from any artifact stored in it
:ref:`swh.journal `
persistent logger of changes to the archive, with publish-subscribe support
:ref:`swh.lister `
collection of listers for all sorts of source code hosting and distribution
places (forges, distributions, package managers, etc.)
:ref:`swh.loader-core `
low-level loading utilities and helpers used by all other loaders
:ref:`swh.loader-debian `
loader for `Debian `_ source packages
:ref:`swh.loader-dir `
loader for source directories (e.g., expanded tarballs)
:ref:`swh.loader-git `
loader for `Git `_ repositories
:ref:`swh.loader-mercurial `
loader for `Mercurial `_ repositories
:ref:`swh.loader-pypi `
loader for `PyPI `_ source code releases
:ref:`swh.loader-svn `
loader for `Subversion `_ repositories
:ref:`swh.loader-tar `
loader for source tarballs (including Tar, ZIP and other archive formats)
:ref:`swh.model `
implementation of the :ref:`data-model` to archive source code artifacts
:ref:`swh.objstorage `
content-addressable object storage
:ref:`swh.scheduler `
task manager for asynchronous/delayed tasks, used for recurrent (e.g.,
listing a forge, loading new stuff from a Git repository) and one-off
activities (e.g., loading a specific version of a source package)
:ref:`swh.storage `
abstraction layer over the archive, allowing to access all stored source
code artifacts as well as their metadata
:ref:`swh.vault `
implementation of the vault service, allowing to retrieve parts of the
archive as self-contained bundles (e.g., individual releases, entire
repository snapshots, etc.)
:ref:`swh.web `
Web application(s) to browse the archive, for both interactive (HTML UI)
and mechanized (REST API) use
Dependencies
------------
The dependency relationships among the various modules are depicted below.
.. _py-deps-swh:
.. figure:: images/py-deps-swh.svg
:width: 1024px
:align: center
Dependencies among top-level Python modules (click to zoom).
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* `URLs index `_
* :ref:`search`
* :ref:`glossary`
.. ensure sphinx does not complain about index files not being included
.. toctree::
:hidden:
:glob:
architecture
getting-started
+ developer-setup
+ manual-setup
apidoc/modules
swh-*/index
diff --git a/docs/getting-started.rst b/docs/manual-setup.rst
similarity index 95%
copy from docs/getting-started.rst
copy to docs/manual-setup.rst
index e8261c1..ba0b3cd 100644
--- a/docs/getting-started.rst
+++ b/docs/manual-setup.rst
@@ -1,238 +1,227 @@
-.. _getting-started:
-
-Run your own Software Heritage
-==============================
-
-This tutorial will guide from the basic step of obtaining the source code of
-the Software Heritage stack to running a local copy of it with which you can
-archive source code and browse it on the web. To that end, just follow the
-steps detailed below.
-
-.. highlight:: bash
-
+.. _manual-setup:
Step 0 --- get the code
-----------------------
The `swh-environment
`_ Git (meta)
repository orchestrates the Git repositories of all Software Heritage modules.
Clone it::
git clone https://forge.softwareheritage.org/source/swh-environment.git
then recursively clone all Python module repositories. For this step you will
need the `mr `_ tool. Once you have installed
``mr``, just run::
cd swh-environment
bin/update
.. IMPORTANT::
From now on this tutorial will assume that you **run commands listed below
from within the swh-environment** directory.
For periodic repository updates just re-run ``bin/update``.
Step 1 --- install system dependencies
--------------------------------------
You need to install three types of dependencies: some base packages, Node.js
modules (for the web app), and Postgres (as storage backend).
Package dependencies
~~~~~~~~~~~~~~~~~~~~
Software Heritage requires some dependencies that are usually packaged by your
package manager. On Debian/Ubuntu-based distributions::
sudo apt-get install curl ca-certificates
curl https://deb.nodesource.com/setup_8.x | sudo bash
curl https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
sudo apt update
sudo apt install python3 python3-venv libsvn-dev postgresql-10 nodejs \
libsystemd-dev libpython3-dev dia postgresql-autodoc \
postgresql-server-dev-all
Postgres
~~~~~~~~
You need a running Postgres instance with administrator access (e.g., to create
databases). On Debian/Ubuntu based distributions, the previous step
(installation) should be enough.
For other platforms and more details refer to the `PostgreSQL installation
documentation
`_.
You also need to have access to a superuser account on the database. For that,
the easiest way is to create a PostgreSQL account that has the same name as
your username::
sudo -u postgres createuser --createdb --superuser $USER
You can check that this worked by doing, from your user (you should not be
asked for a password)::
psql postgres
Node.js modules
~~~~~~~~~~~~~~~
If you want to run the web app to browser your local archive you will need some
Node.js modules, in particular to pack web resources into a single compact
file. To that end the following should suffice::
cd swh-web
npm install
cd -
You are now good to go with all needed dependencies on your development
machine!
Step 2 --- install Python packages in a virtualenv
--------------------------------------------------
From now on you will need to work in a `virtualenv
`_ containing the Python
environment with all the Software Heritage modules and their dependencies. To
that end you can do (once)::
python3 -m venv .venv
Then, activate the virtualenv (do this every time you start working on Software
Heritage)::
source .venv/bin/activate
You can now install Software Heritage Python modules, their dependencies and
the testing-related dependencies using::
pip install $( bin/pip-swh-packages --with-testing )
Step 3 --- set up storage
-------------------------
Then you will need a local storage service that will archive and serve source
code artifacts via a REST API. The Software Heritage storage layer comes in two
parts: a content-addressable :term:`object storage` on your file system (for file
contents) and a Postgres database (for the graph structure of the archive). See
the :ref:`data-model` for more information. The storage layer is configured via
a YAML configuration file, located at
``~/.config/swh/storage/storage.yml``. Create it with a content like:
.. code-block:: yaml
storage:
cls: local
args:
db: "dbname=softwareheritage-dev"
objstorage:
cls: pathslicing
args:
root: /srv/softwareheritage/objects/
slicing: 0:2/2:4
Make sure that the :term:`object storage` root exists on the filesystem and is writable
to your user, e.g.::
sudo mkdir -p /srv/softwareheritage/objects
sudo chown "${USER}:" /srv/softwareheritage/objects
You are done with :term:`object storage` setup! Let's setup the database::
swh-db-init storage -d softwareheritage-dev
``softwareheritage-dev`` is the name of the DB that will be created, it should
match the ``db`` line in ``storage.yml``
To check that you can successfully connect to the DB (you should not be asked
for a password)::
psql softwareheritage-dev
You can now run the storage server like this::
python3 -m swh.storage.api.server --host localhost --port 5002 ~/.config/swh/storage/storage.yml
Step 4 --- ingest repositories
------------------------------
You are now ready to ingest your first repository into your local Software
Heritage. For the sake of example, we will ingest a few Git repositories. The
module in charge of ingesting Git repositories is the *Git loader*, Python
module ``swh.loader.git``. Its configuration file is at
``~/.config/swh/loader/git.yml``. Create it with a content like:
.. code-block:: yaml
storage:
cls: remote
args:
url: http://localhost:5002
It just informs the Git loader to use the storage server running on your
machine. The ``url`` line should match the command line used to run the storage
server.
You can now ingest Git repository on the command line using the command::
python3 -m swh.loader.git.loader --origin-url GIT_CLONE_URL
For instance, you can try ingesting the following repositories, in increasing
size order (note that the last two might take a few hours to complete and will
occupy several GB on both the Postgres DB and the object storage)::
python3 -m swh.loader.git.loader --origin-url https://github.com/SoftwareHeritage/swh-storage.git
python3 -m swh.loader.git.loader --origin-url https://github.com/hylang/hy.git
python3 -m swh.loader.git.loader --origin-url https://github.com/ocaml/ocaml.git
# WARNING: next repo is big
python3 -m swh.loader.git.loader --origin-url https://github.com/torvalds/linux.git
Congratulations, you have just archived your first source code repositories!
To re-archive the same repositories later on you can rerun the same commands:
only *new* objects added since the previous visit will be archived upon the
next one.
Step 5 --- browse the archive
-----------------------------
You can now setup a local web app to browse what you have locally archived. The
web app uses the configuration file ``~/.config/swh/web/web.yml``. Create it
and fill it with something like:
.. code-block:: yaml
storage:
cls: remote
args:
url: http://localhost:5002
Nothing new here, the configuration just references the local storage server,
which have been used before for repository ingestion.
You can now run the web app, and browse your local archive::
make run-django-webpack-devserver
xdg-open http://localhost:5004
Note that the ``make`` target will first compile a `webpack
`_ with various web assets and then launch the web app;
for webpack compilation you will need the Node.js dependencies discussed above.
As an initial tour of the web app, try searching for one of the repositories
you have ingested (e.g., entering the ``hylang`` or ``ocaml`` keywords in the
search bar). Clicking on the repository name you will be brought back in time,
and you will be able to browse the source code and development history you have
archived.
Enjoy!