Changeset View
Changeset View
Standalone View
Standalone View
docs/getting-started.rst
- This file was copied to docs/manual-setup.rst.
.. highlight:: bash | |||||
.. _getting-started: | .. _getting-started: | ||||
Run your own Software Heritage | Run your own Software Heritage | ||||
============================== | ============================== | ||||
This tutorial will guide from the basic step of obtaining the source code of | This tutorial will guide from the basic step of obtaining the source code of | ||||
the Software Heritage stack to running a local copy of it with which you can | the Software Heritage stack to running a local copy of it with which you can | ||||
archive source code and browse it on the web. To that end, just follow the | archive source code and browse it on the web. To that end, just follow the | ||||
steps detailed below. | steps detailed below. | ||||
.. highlight:: bash | |||||
Step 0 --- get the code | |||||
----------------------- | |||||
The `swh-environment | |||||
<https://forge.softwareheritage.org/source/swh-environment/>`_ Git (meta) | |||||
repository orchestrates the Git repositories of all Software Heritage modules. | |||||
Clone it:: | |||||
git clone https://forge.softwareheritage.org/source/swh-environment.git | |||||
then recursively clone all Python module repositories. For this step you will | |||||
need the `mr <http://myrepos.branchable.com/>`_ tool. Once you have installed | |||||
``mr``, just run:: | |||||
cd swh-environment | |||||
bin/update | |||||
.. IMPORTANT:: | |||||
From now on this tutorial will assume that you **run commands listed below | |||||
from within the swh-environment** directory. | |||||
For periodic repository updates just re-run ``bin/update``. | |||||
Step 1 --- install system dependencies | |||||
-------------------------------------- | |||||
You need to install three types of dependencies: some base packages, Node.js | |||||
modules (for the web app), and Postgres (as storage backend). | |||||
Package dependencies | |||||
~~~~~~~~~~~~~~~~~~~~ | |||||
Software Heritage requires some dependencies that are usually packaged by your | |||||
package manager. On Debian/Ubuntu-based distributions:: | |||||
sudo apt-get install curl ca-certificates | |||||
curl https://deb.nodesource.com/setup_8.x | sudo bash | |||||
curl https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add - | |||||
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list' | |||||
sudo apt update | |||||
sudo apt install python3 python3-venv libsvn-dev postgresql-10 nodejs \ | |||||
libsystemd-dev libpython3-dev dia postgresql-autodoc \ | |||||
postgresql-server-dev-all | |||||
Postgres | |||||
~~~~~~~~ | |||||
You need a running Postgres instance with administrator access (e.g., to create | |||||
databases). On Debian/Ubuntu based distributions, the previous step | |||||
(installation) should be enough. | |||||
For other platforms and more details refer to the `PostgreSQL installation | |||||
documentation | |||||
<https://www.postgresql.org/docs/current/static/tutorial-install.html>`_. | |||||
You also need to have access to a superuser account on the database. For that, | |||||
the easiest way is to create a PostgreSQL account that has the same name as | |||||
your username:: | |||||
sudo -u postgres createuser --createdb --superuser $USER | |||||
You can check that this worked by doing, from your user (you should not be | |||||
asked for a password):: | |||||
psql postgres | |||||
Node.js modules | |||||
~~~~~~~~~~~~~~~ | |||||
If you want to run the web app to browser your local archive you will need some | |||||
Node.js modules, in particular to pack web resources into a single compact | |||||
file. To that end the following should suffice:: | |||||
cd swh-web | |||||
npm install | |||||
cd - | |||||
You are now good to go with all needed dependencies on your development | |||||
machine! | |||||
Step 2 --- install Python packages in a virtualenv | |||||
-------------------------------------------------- | |||||
From now on you will need to work in a `virtualenv | |||||
<https://docs.python.org/3/library/venv.html>`_ containing the Python | |||||
environment with all the Software Heritage modules and their dependencies. To | |||||
that end you can do (once):: | |||||
python3 -m venv .venv | |||||
Then, activate the virtualenv (do this every time you start working on Software | |||||
Heritage):: | |||||
source .venv/bin/activate | |||||
You can now install Software Heritage Python modules, their dependencies and | |||||
the testing-related dependencies using:: | |||||
pip install $( bin/pip-swh-packages --with-testing ) | |||||
Step 3 --- set up storage | |||||
------------------------- | |||||
Then you will need a local storage service that will archive and serve source | |||||
code artifacts via a REST API. The Software Heritage storage layer comes in two | |||||
parts: a content-addressable :term:`object storage` on your file system (for file | |||||
contents) and a Postgres database (for the graph structure of the archive). See | |||||
the :ref:`data-model` for more information. The storage layer is configured via | |||||
a YAML configuration file, located at | |||||
``~/.config/swh/storage/storage.yml``. Create it with a content like: | |||||
.. code-block:: yaml | |||||
storage: | |||||
cls: local | |||||
args: | |||||
db: "dbname=softwareheritage-dev" | |||||
objstorage: | |||||
cls: pathslicing | |||||
args: | |||||
root: /srv/softwareheritage/objects/ | |||||
slicing: 0:2/2:4 | |||||
Make sure that the :term:`object storage` root exists on the filesystem and is writable | |||||
to your user, e.g.:: | |||||
sudo mkdir -p /srv/softwareheritage/objects | |||||
sudo chown "${USER}:" /srv/softwareheritage/objects | |||||
You are done with :term:`object storage` setup! Let's setup the database:: | |||||
swh-db-init storage -d softwareheritage-dev | |||||
``softwareheritage-dev`` is the name of the DB that will be created, it should | |||||
match the ``db`` line in ``storage.yml`` | |||||
To check that you can successfully connect to the DB (you should not be asked | |||||
for a password):: | |||||
psql softwareheritage-dev | |||||
You can now run the storage server like this:: | |||||
python3 -m swh.storage.api.server --host localhost --port 5002 ~/.config/swh/storage/storage.yml | |||||
Step 4 --- ingest repositories | |||||
------------------------------ | |||||
You are now ready to ingest your first repository into your local Software | |||||
Heritage. For the sake of example, we will ingest a few Git repositories. The | |||||
module in charge of ingesting Git repositories is the *Git loader*, Python | |||||
module ``swh.loader.git``. Its configuration file is at | |||||
``~/.config/swh/loader/git.yml``. Create it with a content like: | |||||
.. code-block:: yaml | |||||
storage: | |||||
cls: remote | |||||
args: | |||||
url: http://localhost:5002 | |||||
It just informs the Git loader to use the storage server running on your | Using Docker | ||||
machine. The ``url`` line should match the command line used to run the storage | ++++++++++++ | ||||
server. | |||||
You can now ingest Git repository on the command line using the command:: | The easiest way to run a Software Heritage instance is to use Docker and | ||||
docker-compose. Please refer to the `docker-compose documentation | |||||
<https://docs.docker.com/compose/>`_ if you do not have a working docker setup. | |||||
ardumont: to run a SoftwareHeritage... | |||||
python3 -m swh.loader.git.loader --origin-url GIT_CLONE_URL | Then:: | ||||
For instance, you can try ingesting the following repositories, in increasing | git clone https://forge.softwareheritage.org/source/swh-docker-dev.git | ||||
size order (note that the last two might take a few hours to complete and will | cd swh-docker-dev | ||||
occupy several GB on both the Postgres DB and the object storage):: | docker-compose up -d | ||||
python3 -m swh.loader.git.loader --origin-url https://github.com/SoftwareHeritage/swh-storage.git | When all the containers are up and runnig, you have a running Software | ||||
python3 -m swh.loader.git.loader --origin-url https://github.com/hylang/hy.git | Heritage platform. You should open: | ||||
Not Done Inline Actionsrunning ardumont: running | |||||
python3 -m swh.loader.git.loader --origin-url https://github.com/ocaml/ocaml.git | |||||
# WARNING: next repo is big | - http://localhost:5080/ to navigate your (empty for now) SWH archive, | ||||
python3 -m swh.loader.git.loader --origin-url https://github.com/torvalds/linux.git | - http://localhost:5080/rabbitmq to access the rabbitmq dashoard (guest/guest), | ||||
- http://localhost:5080/prometheus to explore the platform's metrics, | |||||
Congratulations, you have just archived your first source code repositories! | All the internal APIs are also exposed: | ||||
To re-archive the same repositories later on you can rerun the same commands: | - http://localhost:5080/scheduler | ||||
only *new* objects added since the previous visit will be archived upon the | - http://localhost:5080/storage | ||||
next one. | - http://localhost:5080/indexer-storage | ||||
- http://localhost:5080/deposit | |||||
Not Done Inline Actionsis that using what you proposed about openapi/swagger (T1510)? ardumont: is that using what you proposed about openapi/swagger (T1510)? | |||||
Done Inline Actionsnot at all douardda: not at all | |||||
- http://localhost:5080/objstorage | |||||
At this point, the simplest way to start indexing software is to use the 'Save | |||||
Code Now' feature of the archive web interface: | |||||
Step 5 --- browse the archive | http://localhost:5080/browse/origin/save/ | ||||
----------------------------- | |||||
You can now setup a local web app to browse what you have locally archived. The | Enjoy filling your hard drives! | ||||
web app uses the configuration file ``~/.config/swh/web/web.yml``. Create it | |||||
and fill it with something like: | |||||
.. code-block:: yaml | |||||
storage: | Hacking the archive | ||||
cls: remote | +++++++++++++++++++ | ||||
args: | |||||
url: http://localhost:5002 | |||||
Nothing new here, the configuration just references the local storage server, | If you want to hack the code of the Software Heritage Archive, a bit more work | ||||
which have been used before for repository ingestion. | will be required. | ||||
You can now run the web app, and browse your local archive:: | The best way to have a development-friendly environment is to build a mixed | ||||
docker/virtual env setup. | |||||
make run-django-webpack-devserver | Such a setup is described in the :ref:`Perfect Developer Setup guide | ||||
xdg-open http://localhost:5004 | <developer-setup>`. | ||||
Note that the ``make`` target will first compile a `webpack | |||||
<https://webpack.js.org/>`_ with various web assets and then launch the web app; | |||||
for webpack compilation you will need the Node.js dependencies discussed above. | |||||
As an initial tour of the web app, try searching for one of the repositories | Installing from sources (without a virtualenv) | ||||
you have ingested (e.g., entering the ``hylang`` or ``ocaml`` keywords in the | ++++++++++++++++++++++++++++++++++++++++++++++ | ||||
search bar). Clicking on the repository name you will be brought back in time, | |||||
and you will be able to browse the source code and development history you have | |||||
archived. | |||||
Enjoy! | If you prefer to run everything straight, you should refer to the :ref:`Manual | ||||
Setup Guide <manual-setup>` |
to run a SoftwareHeritage...