diff --git a/docs/getting-started.rst b/docs/getting-started.rst
index 5d5a6e4..cf239d4 100644
--- a/docs/getting-started.rst
+++ b/docs/getting-started.rst
@@ -1,164 +1,216 @@
.. _getting-started:
Run your own Software Heritage
==============================
-This walkthrough will guide from the basic step of obtaining the source code of
+This tutorial will guide from the basic step of obtaining the source code of
the Software Heritage stack to running a local copy of it with which you can
archive source code and browse it on the web. To that end, just follow the
-steps detailed below:
-
-.. contents:: :local:
+steps detailed below.
.. highlight:: bash
Step 0 --- get the code
-----------------------
The `swh-environment
`_ Git (meta)
repository orchestrates the Git repositories of all Software Heritage modules.
Clone it::
git clone https://forge.softwareheritage.org/source/swh-environment.git
then recursively clone all Python module repositories. For this step you will
need the `mr `_ tool, see the ``README`` file
of swh-environment for more information::
cd swh-environment
readlink -f .mrconfig >> ~/.mrtrust
mr up
+.. IMPORTANT::
+
+ From now on this tutorial will assume that you **run commands listed below
+ from within the swh-environment** directory.
+
For periodic code you can use the following helper::
- cd swh-environment
bin/update
From now on you will need to have a ``PYTHONPATH`` environment variable that
allows to find Python modules in the ``swh`` namespace. To that end you can
source the ``pythonpath.sh`` snippet from swh-environment::
source pythonpath.sh
To make setting ``PYTHONPATH`` easier in the future, you might want to define a
shell alias, e.g.::
alias swh-pythonpath='cd /path/to/swh-environment/ ; source pythonpath.sh ; cd - > /dev/null'
Step 1 --- install dependencies
-------------------------------
-**TO BE WRITTEN**
+You need to install three types of dependencies: Python modules, Node.js
+modules (for the web app), and Postgres (as storage backend).
+
+
+Python modules
+~~~~~~~~~~~~~~
+
+You can install Python modules using ``pip3`` via the following helper::
+
+ sudo bin/pip-install-deps
+
+``pip-install-deps`` accepts additional ``pip3 install`` options so, e.g., if
+you want to install Python modules as a user rather than system wide you can do
+something like this instead::
+
+ bin/pip-install-deps --user
+
+If you want to see the list of Python dependencies, e.g., to install them by
+hand or via your package manager, you can use a related helpe::
+
+ bin/pip-ls-deps
+
+
+Postgres
+~~~~~~~~
+
+You need a running Postgres instance with administrator access (e.g., to create
+databases). On Debian/Ubuntu based distributions it should be as easy as::
+
+ sudo apt install postgresql
+
+For other platforms and more details refer to the `PostgreSQL installation
+documnetation
+`_.
+
+
+Node.js modules
+~~~~~~~~~~~~~~~
+
+If you want to run the web app to browser your local archive you will need some
+Node.js modules, in particular to pack web resources into a single compact
+file. To that end the following should suffice::
+
+ sudo apt install nodejs npm
+ cd swh-web
+ npm install
+ cd -
+
+You are now good to go with all needed dependencies on your development
+machine!
Step 2 --- set up storage
-------------------------
Then you will need a local storage service that will archive and serve source
code artifacts via a REST API. The Software Heritage storage layer comes in two
parts: a content-addressable object storage on your file system (for file
contents) and a Postgres database (for the graph structure of the archive). See
the :ref:`data-model` for more information. The storage layer is configured via
a YAML configuration file, located at
``~/.config/swh/storage/storage.yml``. Create it with a content like:
.. code-block:: yaml
storage:
cls: local
args:
db: "host=localhost port=5432 dbname=softwareheritage-dev user=swhdev password=foobar"
objstorage:
cls: pathslicing
args:
root: /srv/softwareheritage/objects/
slicing: 0:2/2:4
Make sure that the object storage root exists on the filesystem and is writable
to your user, e.g.::
sudo mkdir /srv/softwareheritage/objects
sudo chown "${USER}:" /srv/softwareheritage/objects
You are done with object storage setup! Let's setup the database::
- cd swh-environment/swh-storage/sql/
+ cd swh-storage/sql/
sudo -u postgres bin/db-init 5432 softwareheritage-dev swhdev
+ cd -
Let's unpack the second line. You should have Postgres administrator privileges
to be able to create databases, hence the ``sudo -u postgres``; if your user
has Postgres admin privileges, you can avoid ``sudo`` here. ``5432`` is the
default port of the main Postgres cluster, adapt as needed.
``softwareheritage-dev`` is the name of the DB that will be created, it should
match the ``db`` line in ``storage.yml``; same goes for ``swhdev``, the DB user
name. You will be interactively asked for a password for the DB user; you
should provide one that matches the ``db`` line value.
To check that you can successfully connect to the DB (you will be interactively
asked for the DB password)::
psql -h localhost -p 5432 -U swhdev softwareheritage-dev
Note that you can simplify interactive use and reduce configuration clutter
using Postgres `password
`_ and
`service
`_
configuration files. Any valid `libpq connection string
`_
will make the ``db`` line of ``storage.yml`` happy.
You can now run the storage server like this::
python3 -m swh.storage.api.server --host localhost --port 5002 ~/.config/swh/storage/storage.yml
Step 3 --- ingest repositories
------------------------------
You are now ready to ingest your first repository into your local Software
Heritage. For the sake of example, we will ingest a few Git repositories. The
module in charge of ingesting Git repositories is the *Git loader*, Python
module ``swh.loader.git``. Its configuration file is at
``~/.config/swh/loader/git-updater.yml``. Create it with a content like:
.. code-block:: yaml
storage:
cls: remote
args:
url: http://localhost:5002
It just informs the Git loader to use the storage server running on your
machine. The ``url`` line should match the command line used to run the storage
server.
You can now ingest Git repository on the command line using the command::
python3 -m swh.loader.git.updater --origin-url GIT_CLONE_URL
For instance, you can try ingesting the following repositories, in increasing
size order (note that the last two might take a few hours to complete and will
occupy several GB on both the Postgres DB and the object storage)::
python3 -m swh.loader.git.updater --origin-url https://github.com/SoftwareHeritage/swh-storage.git
python3 -m swh.loader.git.updater --origin-url https://github.com/hylang/hy.git
python3 -m swh.loader.git.updater --origin-url https://github.com/ocaml/ocaml.git
# WARNING: next repo is big
python3 -m swh.loader.git.updater --origin-url https://github.com/torvalds/linux.git
Congratulations, you have just archived your first source code repositories!
To re-archive the same repositories later on you can rerun the same commands:
-only objects *added* since the previous visit will be archived upon the next
-one.
+only *new* objects added since the previous visit will be archived upon the
+next one.
Step 4 --- browse the archive
-----------------------------
**TO BE WRITTEN**