diff --git a/docs/getting-started.rst b/docs/getting-started.rst index 9525897..acf1b0c 100644 --- a/docs/getting-started.rst +++ b/docs/getting-started.rst @@ -1,163 +1,165 @@ .. _getting-started: .. highlight:: bash Run your own Software Heritage ============================== This walkthrough will guide from the basic step of obtaining the source code of -the Software Heritage stack to running a local copy of it in which you can -ingest source code of existing repositories and browse them using the archive -web application. +the Software Heritage stack to running a local copy of it with which you can +archive source code and browse it on the web. To that end, just follow the +steps detailed below: + +.. contents:: :local: Step 0 --- get the code ----------------------- The `swh-environment `_ Git (meta) repository orchestrates the Git repositories of all Software Heritage modules. Clone it:: git clone https://forge.softwareheritage.org/source/swh-environment.git then recursively clone all Python module repositories. For this step you will need the `mr `_ tool, see the ``README`` file of swh-environment for more information:: cd swh-environment readlink -f .mrconfig >> ~/.mrtrust mr up For periodic code you can use the following helper:: cd swh-environment bin/update From now on you will need to have a ``PYTHONPATH`` environment variable that allows to find Python modules in the ``swh`` namespace. To that end you can source the ``pythonpath.sh`` snippet from swh-environment:: source pythonpath.sh To make setting ``PYTHONPATH`` easier in the future, you might want to define a shell alias, e.g.:: alias swh-pythonpath='cd /path/to/swh-environment/ ; source pythonpath.sh ; cd - > /dev/null' -Step 1 --- install software dependencies ----------------------------------------- +Step 1 --- install dependencies +------------------------------- **TO BE WRITTEN** Step 2 --- set up storage ------------------------- Then you will need a local storage service that will archive and serve source code artifacts via a REST API. The Software Heritage storage layer comes in two parts: a content-addressable object storage on your file system (for file contents) and a Postgres database (for the graph structure of the archive). See the :ref:`data-model` for more information. The storage layer is configured via a YAML configuration file, located at ``~/.config/swh/storage/storage.yml``. Create it with a content like: .. code-block:: yaml storage: cls: local args: db: "host=localhost port=5432 dbname=softwareheritage-dev user=swhdev password=foobar" objstorage: cls: pathslicing args: root: /srv/softwareheritage/objects/ slicing: 0:2/2:4 Make sure that the object storage root exists on the filesystem and is writable to your user, e.g.:: sudo mkdir /srv/softwareheritage/objects sudo chown "${USER}:" /srv/softwareheritage/objects You are done with object storage setup! Let's setup the database:: cd swh-environment/swh-storage/sql/ sudo -u postgres bin/db-init 5432 softwareheritage-dev swhdev Let's unpack the second line. You should have Postgres administrator privileges to be able to create databases, hence the ``sudo -u postgres``; if your user has Postgres admin privileges, you can avoid ``sudo`` here. ``5432`` is the default port of the main Postgres cluster, adapt as needed. ``softwareheritage-dev`` is the name of the DB that will be created, it should match the ``db`` line in ``storage.yml``; same goes for ``swhdev``, the DB user name. You will be interactively asked for a password for the DB user; you should provide one that matches the ``db`` line value. To check that you can successfully connect to the DB (you will be interactively asked for the DB password):: psql -h localhost -p 5432 -U swhdev softwareheritage-dev Note that you can simplify interactive use and reduce configuration clutter using Postgres `password `_ and `service `_ configuration files. Any valid `libpq connection string `_ will make the ``db`` line of ``storage.yml`` happy. You can now run the storage server like this:: python3 -m swh.storage.api.server --host localhost --port 5002 ~/.config/swh/storage/storage.yml Step 3 --- ingest repositories ------------------------------ You are now ready to ingest your first repository into your local Software Heritage. For the sake of example, we will ingest a few Git repositories. The module in charge of ingesting Git repositories is the *Git loader*, Python module ``swh.loader.git``. Its configuration file is at ``~/.config/swh/loader/git-updater.yml``. Create it with a content like: .. code-block:: yaml storage: cls: remote args: url: http://localhost:5002 It just informs the Git loader to use the storage server running on your machine. The ``url`` line should match the command line used to run the storage server. You can now ingest Git repository on the command line using the command:: python3 -m swh.loader.git.updater --origin-url GIT_CLONE_URL For instance, you can try ingesting the following repositories, in increasing size order (note that the last two might take a few hours to complete and will occupy several GB on both the Postgres DB and the object storage):: python3 -m swh.loader.git.updater --origin-url https://github.com/SoftwareHeritage/swh-storage.git python3 -m swh.loader.git.updater --origin-url https://github.com/hylang/hy.git python3 -m swh.loader.git.updater --origin-url https://github.com/ocaml/ocaml.git # WARNING: next repo is big python3 -m swh.loader.git.updater --origin-url https://github.com/torvalds/linux.git Congratulations, you have just archived your first source code repositories! To re-archive the same repositories later on you can rerun the same commands: only objects *added* since the previous visit will be archived upon the next one. Step 4 --- browse the archive ----------------------------- **TO BE WRITTEN**