diff --git a/docs/developer-setup.rst b/docs/developer-setup.rst new file mode 100644 --- /dev/null +++ b/docs/developer-setup.rst @@ -0,0 +1,148 @@ +.. _developer-setup: + +Developer setup +=============== + +In this guide, we will set up a dual environment: + +- A virtual env in which all the |swh| packages will be installed in 'develop' + mode, this will allow you to navigate the source code, hack it, and run + locally the unit tests. + +- A docker 'cluster' built with docker-compose, which allows to easily run all + the components of the |swh| architecture. It is possible to run those docker + containers with your locally modified code for one or several |swh| packages. + + Please read the `README file`_ in the swh-docker-dev repository for more + details on how to do this. + +.. _`README file`: https://forge.softwareheritage.org/source/swh-docker-dev/browse/master/README.md + +Checkout the source code +------------------------ + +Clone the |swh| environment repository:: + + ~$ git clone https://forge.softwareheritage.org/source/swh-environment.git + [...] + ~$ cd swh-environment + ~/swh-environment$ + +Create a virtual env:: + + ~/swh-environment$ mkvirtualenv -p /usr/bin/python3 -a $PWD swh + [...] + (swh) ~/swh-environment$ + + +.. Note: using virtualenvwrapper_ is not mandatory here. You can use plain + virtualenvs, or any other venv management tool (pipenv_ or poetry_ + for example). Using a tool such as virtualenvwrapper_ just makes life + easier... + + +.. _virtualenvwrapper: https://virtualenvwrapper.readthedocs.io/ +.. _poetry: https://poetry.eustace.io/ +.. _pipenv: https://pipenv.readthedocs.io/ + + +Install all the swh packages (in develop mode):: + + (swh) ~/swh-environment$ pip install $(./bin/pip-swh-packages --with-testing) \ + tox pifpaf + [...] + + +Setup the docker environment +---------------------------- + +Install docker-compose:: + + (swh) ~/swh-environment$ pip install docker-compose + [...] + +Make your life easier:: + + (swh) ~/swh-environment$ cat >>$VIRTUAL_ENV/bin/postactivate <`_ Git (meta) -repository orchestrates the Git repositories of all Software Heritage modules. -Clone it:: - - git clone https://forge.softwareheritage.org/source/swh-environment.git - -then recursively clone all Python module repositories. For this step you will -need the `mr `_ tool. Once you have installed -``mr``, just run:: - - cd swh-environment - bin/update - -.. IMPORTANT:: - - From now on this tutorial will assume that you **run commands listed below - from within the swh-environment** directory. - -For periodic repository updates just re-run ``bin/update``. - - -Step 1 --- install system dependencies --------------------------------------- - -You need to install three types of dependencies: some base packages, Node.js -modules (for the web app), and Postgres (as storage backend). - -Package dependencies -~~~~~~~~~~~~~~~~~~~~ - -Software Heritage requires some dependencies that are usually packaged by your -package manager. On Debian/Ubuntu-based distributions:: - - sudo apt-get install curl ca-certificates - curl https://deb.nodesource.com/setup_8.x | sudo bash - curl https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add - - sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list' - sudo apt update - sudo apt install python3 python3-venv libsvn-dev postgresql-10 nodejs \ - libsystemd-dev libpython3-dev dia postgresql-autodoc \ - postgresql-server-dev-all - -Postgres -~~~~~~~~ - -You need a running Postgres instance with administrator access (e.g., to create -databases). On Debian/Ubuntu based distributions, the previous step -(installation) should be enough. - -For other platforms and more details refer to the `PostgreSQL installation -documentation -`_. - -You also need to have access to a superuser account on the database. For that, -the easiest way is to create a PostgreSQL account that has the same name as -your username:: - - sudo -u postgres createuser --createdb --superuser $USER - -You can check that this worked by doing, from your user (you should not be -asked for a password):: - - psql postgres - -Node.js modules -~~~~~~~~~~~~~~~ - -If you want to run the web app to browser your local archive you will need some -Node.js modules, in particular to pack web resources into a single compact -file. To that end the following should suffice:: - - cd swh-web - npm install - cd - - -You are now good to go with all needed dependencies on your development -machine! - - -Step 2 --- install Python packages in a virtualenv --------------------------------------------------- - -From now on you will need to work in a `virtualenv -`_ containing the Python -environment with all the Software Heritage modules and their dependencies. To -that end you can do (once):: - - python3 -m venv .venv - -Then, activate the virtualenv (do this every time you start working on Software -Heritage):: - - source .venv/bin/activate - -You can now install Software Heritage Python modules, their dependencies and -the testing-related dependencies using:: - - pip install $( bin/pip-swh-packages --with-testing ) - - -Step 3 --- set up storage -------------------------- - -Then you will need a local storage service that will archive and serve source -code artifacts via a REST API. The Software Heritage storage layer comes in two -parts: a content-addressable :term:`object storage` on your file system (for file -contents) and a Postgres database (for the graph structure of the archive). See -the :ref:`data-model` for more information. The storage layer is configured via -a YAML configuration file, located at -``~/.config/swh/storage/storage.yml``. Create it with a content like: - -.. code-block:: yaml - - storage: - cls: local - args: - db: "dbname=softwareheritage-dev" - objstorage: - cls: pathslicing - args: - root: /srv/softwareheritage/objects/ - slicing: 0:2/2:4 - -Make sure that the :term:`object storage` root exists on the filesystem and is writable -to your user, e.g.:: - - sudo mkdir -p /srv/softwareheritage/objects - sudo chown "${USER}:" /srv/softwareheritage/objects - -You are done with :term:`object storage` setup! Let's setup the database:: - - swh-db-init storage -d softwareheritage-dev - -``softwareheritage-dev`` is the name of the DB that will be created, it should -match the ``db`` line in ``storage.yml`` - -To check that you can successfully connect to the DB (you should not be asked -for a password):: - - psql softwareheritage-dev - -You can now run the storage server like this:: - - python3 -m swh.storage.api.server --host localhost --port 5002 ~/.config/swh/storage/storage.yml - - -Step 4 --- ingest repositories ------------------------------- - -You are now ready to ingest your first repository into your local Software -Heritage. For the sake of example, we will ingest a few Git repositories. The -module in charge of ingesting Git repositories is the *Git loader*, Python -module ``swh.loader.git``. Its configuration file is at -``~/.config/swh/loader/git.yml``. Create it with a content like: - -.. code-block:: yaml - - storage: - cls: remote - args: - url: http://localhost:5002 -It just informs the Git loader to use the storage server running on your -machine. The ``url`` line should match the command line used to run the storage -server. +Using Docker +++++++++++++ -You can now ingest Git repository on the command line using the command:: +The easiest way to run a Software Heritage instance is to use Docker and +docker-compose. Please refer to the `docker-compose documentation +`_ if you do not have a working docker setup. - python3 -m swh.loader.git.loader --origin-url GIT_CLONE_URL +Then:: -For instance, you can try ingesting the following repositories, in increasing -size order (note that the last two might take a few hours to complete and will -occupy several GB on both the Postgres DB and the object storage):: + git clone https://forge.softwareheritage.org/source/swh-docker-dev.git + cd swh-docker-dev + docker-compose up -d - python3 -m swh.loader.git.loader --origin-url https://github.com/SoftwareHeritage/swh-storage.git - python3 -m swh.loader.git.loader --origin-url https://github.com/hylang/hy.git - python3 -m swh.loader.git.loader --origin-url https://github.com/ocaml/ocaml.git +When all the containers are up and runnig, you have a running Software +Heritage platform. You should open: - # WARNING: next repo is big - python3 -m swh.loader.git.loader --origin-url https://github.com/torvalds/linux.git +- http://localhost:5080/ to navigate your (empty for now) SWH archive, +- http://localhost:5080/rabbitmq to access the rabbitmq dashoard (guest/guest), +- http://localhost:5080/prometheus to explore the platform's metrics, -Congratulations, you have just archived your first source code repositories! +All the internal APIs are also exposed: -To re-archive the same repositories later on you can rerun the same commands: -only *new* objects added since the previous visit will be archived upon the -next one. +- http://localhost:5080/scheduler +- http://localhost:5080/storage +- http://localhost:5080/indexer-storage +- http://localhost:5080/deposit +- http://localhost:5080/objstorage +At this point, the simplest way to start indexing software is to use the 'Save +Code Now' feature of the archive web interface: -Step 5 --- browse the archive ------------------------------ + http://localhost:5080/browse/origin/save/ -You can now setup a local web app to browse what you have locally archived. The -web app uses the configuration file ``~/.config/swh/web/web.yml``. Create it -and fill it with something like: +Enjoy filling your hard drives! -.. code-block:: yaml - storage: - cls: remote - args: - url: http://localhost:5002 +Hacking the archive ++++++++++++++++++++ -Nothing new here, the configuration just references the local storage server, -which have been used before for repository ingestion. +If you want to hack the code of the Software Heritage Archive, a bit more work +will be required. -You can now run the web app, and browse your local archive:: +The best way to have a development-friendly environment is to build a mixed +docker/virtual env setup. - make run-django-webpack-devserver - xdg-open http://localhost:5004 +Such a setup is described in the :ref:`Perfect Developer Setup guide +`. -Note that the ``make`` target will first compile a `webpack -`_ with various web assets and then launch the web app; -for webpack compilation you will need the Node.js dependencies discussed above. -As an initial tour of the web app, try searching for one of the repositories -you have ingested (e.g., entering the ``hylang`` or ``ocaml`` keywords in the -search bar). Clicking on the repository name you will be brought back in time, -and you will be able to browse the source code and development history you have -archived. +Installing from sources (without a virtualenv) +++++++++++++++++++++++++++++++++++++++++++++++ -Enjoy! +If you prefer to run everything straight, you should refer to the :ref:`Manual +Setup Guide ` diff --git a/docs/index.rst b/docs/index.rst --- a/docs/index.rst +++ b/docs/index.rst @@ -11,7 +11,9 @@ Getting started --------------- -* :ref:`getting-started` ← start here to hack on the Software Heritage software +* :ref:`getting-started` ← start here to get your own Software Heritage + platform running in less than 5 minutes, or +* :ref:`developer-setup` ← here to hack on the Software Heritage software stack @@ -134,5 +136,7 @@ architecture getting-started + developer-setup + manual-setup apidoc/modules swh-*/index diff --git a/docs/getting-started.rst b/docs/manual-setup.rst copy from docs/getting-started.rst copy to docs/manual-setup.rst --- a/docs/getting-started.rst +++ b/docs/manual-setup.rst @@ -1,15 +1,4 @@ -.. _getting-started: - -Run your own Software Heritage -============================== - -This tutorial will guide from the basic step of obtaining the source code of -the Software Heritage stack to running a local copy of it with which you can -archive source code and browse it on the web. To that end, just follow the -steps detailed below. - -.. highlight:: bash - +.. _manual-setup: Step 0 --- get the code -----------------------