Page MenuHomeSoftware Heritage

D8494.id30585.diff
No OneTemporary

D8494.id30585.diff

diff --git a/docs/developer-setup.rst b/docs/developer-setup.rst
--- a/docs/developer-setup.rst
+++ b/docs/developer-setup.rst
@@ -7,15 +7,15 @@
can easily navigate the source code, make modifications, write and execute unit
tests.
-For this, we will use a `virtualenv`_ in which all the |swh| packages will be
+For this, we will use a `virtual environment`_ in which all the |swh| packages will be
installed in 'develop' mode, this will allow you to navigate the source code,
hack it, and run locally the unit tests.
-If you want to test the effect of your modifications in a running |swh|
-instance, you should check the swh-environment docker `documentation`_.
+To test the effect of your modifications, you can :ref:`install your own local
+Software Heritage instance <getting-started>` using Docker.
.. _`documentation`: https://forge.softwareheritage.org/source/swh-environment/browse/master/docker/README.rst?as=remarkup
-.. _`virtualenv`: https://pypi.org/project/virtualenv/
+.. _`virtual environment`: https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/#creating-a-virtual-environment
Install required dependencies
@@ -44,13 +44,10 @@
This installs basic system utilities, Python library dependencies, development tools,
documentation tools and our main database management systems.
-Cassandra and postgresql will be started by tests when they need it, so you
+Cassandra and PostgreSQL will be started by tests when they need it, so you
don't need them started globally (this will save you some RAM)::
- sudo systemctl stop postgresql
- sudo systemctl disable postgresql
- sudo systemctl stop cassandra
- sudo systemctl disable cassandra
+ sudo systemctl disable --now cassandra postgresql
If you intend to hack on the frontend part of |swh| Web Applications, you will also
need to have nodejs >= 14 in your development environment. If the version in your
@@ -256,20 +253,20 @@
The different databases for each subproject that requires one (like ``storage`` or ``scheduler``) should be setup through the ``swh db create`` or ``swh db init`` command. See their help for more information.
+Test changes using a local instance
+-----------------------------------
+
+How to test your changes with a local instance is explained in the :ref:`documentation about our Docker setup <docker-environment>`.
+
+
Sending your changes
--------------------
After you are done making the changes you want, you can send them on our
-forge_. The best way to do that is to use Arcanist, so we recommend you read
-these two guides if you are not familiar with it:
-
-* https://wiki.softwareheritage.org/wiki/Arcanist_setup
-* https://wiki.softwareheritage.org/wiki/Code_review_in_Phabricator
-
+forge. See the guide on :ref:`how to submit patches <patch-submission>`.
.. _pytest: https://pytest.org
.. _tox: https://tox.readthedocs.io
.. _pypi: https://pypi.org
.. _swh-loader-git: https://forge.softwareheritage.org/source/swh-loader-git
.. _pifpaf: https://github.com/jd/pifpaf
-.. _forge: https://forge.softwareheritage.org/
diff --git a/docs/getting-started.rst b/docs/getting-started.rst
--- a/docs/getting-started.rst
+++ b/docs/getting-started.rst
@@ -10,56 +10,171 @@
archive source code and browse it on the web. To that end, just follow the
steps detailed below.
-
-Using Docker
+.. warning::
+ Running a Software Heritage instance on your machine can
+ consume quite a bit of resources: if you play a bit too hard (e.g., if
+ you try to list all GitHub repositories with the corresponding lister),
+ you may fill your hard drive, and consume a lot of CPU, memory and
+ network bandwidth.
+
+Dependencies
------------
-The easiest way to run a Software Heritage instance is to use Docker and
-docker-compose. Please refer to the `docker-compose documentation
-<https://docs.docker.com/compose/>`_ if you do not have a working docker setup.
+The easiest way to run a Software Heritage instance is to use Docker.
+Please `ensure that you have a working recent installation first
+<https://docs.docker.com/engine/install/>`_ (including the
+`Compose <https://docs.docker.com/compose/>`_ plugin.
-Then::
+Quick start
+-----------
- git clone https://forge.softwareheritage.org/source/swh-environment.git
- cd swh-environment/docker
- docker-compose up -d
+First, retrieve Software Heritage development environment to get the
+Docker configuration::
-When all the containers are up and running, you have a running Software
-Heritage platform. You should open:
+ ~$ git clone https://forge.softwareheritage.org/source/swh-environment.git
+ ~$ cd swh-environment/docker
-- http://localhost:5080/ to navigate your (empty for now) SWH archive,
-- http://localhost:5080/rabbitmq to access the rabbitmq dashboard (guest/guest),
-- http://localhost:5080/grafana to explore the platform's metrics (admin/admin),
+Then, start containers::
+
+ ~/swh-environment/docker$ docker compose up -d
+ [...]
+ Creating docker_amqp_1 ... done
+ Creating docker_zookeeper_1 ... done
+ Creating docker_kafka_1 ... done
+ Creating docker_flower_1 ... done
+ Creating docker_swh-scheduler-db_1 ... done
+ [...]
+
+This will build Docker images and run them. Check everything is running
+fine with::
+
+ ~/swh-environment/docker$ docker compose ps
+ Name Command State Ports
+ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ docker_amqp_1 docker-entrypoint.sh rabbi ... Up 15671/tcp, 0.0.0.0:5018->15672/tcp, 25672/tcp, 4369/tcp, 5671/tcp, 5672/tcp
+ docker_flower_1 flower --broker=amqp://gue ... Up 0.0.0.0:5555->5555/tcp
+ docker_kafka_1 start-kafka.sh Up 0.0.0.0:5092->5092/tcp
+ docker_swh-deposit-db_1 docker-entrypoint.sh postgres Up 5432/tcp
+ docker_swh-deposit_1 /entrypoint.sh Up 0.0.0.0:5006->5006/tcp
+ [...]
+
+The startup of some containers may fail the first time for
+dependency-related problems. If some containers failed to start, just
+run the ``docker compose up -d`` command again.
+
+If a container really refuses to start properly, you can check why using
+the ``docker compose logs`` command. For example::
+
+ ~/swh-environment/docker$ docker compose logs swh-lister
+ Attaching to docker_swh-lister_1
+ [...]
+ swh-lister_1 | Processing /src/swh-scheduler
+ swh-lister_1 | Could not install packages due to an EnvironmentError: [('/src/swh-scheduler/.hypothesis/unicodedata/8.0.0/charmap.json.gz', '/tmp/pip-req-build-pm7nsax3/.hypothesis/unicodedata/8.0.0/charmap.json.gz', "[Errno 13] Permission denied: '/src/swh-scheduler/.hypothesis/unicodedata/8.0.0/charmap.json.gz'")]
+ swh-lister_1 |
-All the internal APIs are also exposed:
+.. note::
-- http://localhost:5080/scheduler
-- http://localhost:5080/storage
-- http://localhost:5080/indexer-storage
-- http://localhost:5080/deposit
-- http://localhost:5080/objstorage
+ For details on the various Docker images and how to work with them,
+ see the full :ref:`docker-environment` documentation.
-At this point, the simplest way to start loading software is to use the 'Save
-Code Now' feature of the archive web interface:
+Once all containers are running, you can use the web interface by
+opening http://localhost:5080/ in your web browser.
+
+At this point, the archive is empty and needs to be filled with some
+content. The simplest way to start loading software is to use the
+*Save Code Now* feature of the archive web interface:
http://localhost:5080/browse/origin/save/
-For details and an overview of what you can do with your Software Heritage,
-see the full :ref:`docker-environment` documentation.
+You can also use the command line interface to inject code. For
+example to retrieve projects hossted on the https://0xacab.org GitLab forge::
-Enjoy filling your hard drives!
+ ~/swh-environment/docker$ docker compose exec swh-scheduler \
+ swh scheduler task add list-gitlab-full \
+ -p oneshot url=https://0xacab.org/api/v4
+ Created 1 tasks
-Hacking the archive
--------------------
+ Task 1
+ Next run: just now (2018-12-19 14:58:49+00:00)
+ Interval: 90 days, 0:00:00
+ Type: list-gitlab-full
+ Policy: oneshot
+ Args:
+ Keyword args:
+ url=https://0xacab.org/api/v4
+
+This task will scrape the forge’s project list and register origins to the scheduler.
+This takes at most a couple of minutes.
+
+Then, you must tell the scheduler to create loading tasks for these origins.
+For example, to create tasks for 100 of these origins::
+
+ ~/swh-environment/docker$ docker compose exec swh-scheduler \
+ swh scheduler origin schedule-next git 100
+
+This will take a bit of time to complete.
+
+To increase the speed at which git repositories are imported, you can
+spawn more ``swh-loader-git`` workers::
-If you want to hack the code of the Software Heritage Archive, a bit more work
-will be required.
+ ~/swh-environment/docker$ docker compose exec swh-scheduler \
+ celery status
+ listers@50ac2185c6c9: OK
+ loader@b164f9055637: OK
+ indexer@33bc6067a5b8: OK
+ vault@c9fef1bbfdc1: OK
-To be able to write patches, you will need a development setup.
+ 4 nodes online.
+ ~/swh-environment/docker$ docker compose exec swh-scheduler \
+ celery control pool_grow 3 -d loader@b164f9055637
+ -> loader@b164f9055637: OK
+ pool will grow
+ ~/swh-environment/docker$ docker compose exec swh-scheduler \
+ celery inspect -d loader@b164f9055637 stats | grep prefetch_count
+ "prefetch_count": 4
-The best way to have a development-friendly environment is to build a mixed
-docker/virtualenv setup.
+Now there are 4 workers ingesting git repositories. You can also
+increase the number of ``swh-loader-git`` containers::
+
+ ~/swh-environment/docker$ docker compose up -d --scale swh-loader=4
+ [...]
+ Creating docker_swh-loader_2 ... done
+ Creating docker_swh-loader_3 ... done
+ Creating docker_swh-loader_4 ... done
+
+
+Updating the docker image
+-------------------------
+
+All containers started by ``docker compose`` are bound to a docker image
+named ``swh/stack`` including all the software components of Software
+Heritage. When new versions of these components are released, the docker
+image will not be automatically updated. In order to update all Software
+Heritage components to their latest version, the docker image needs to
+be explicitly rebuilt by issuing the following command from within the
+``docker`` directory::
+
+ ~/swh-environment/docker$ docker build --no-cache -t swh/stack .
+
+Monitor your local installation
+-------------------------------
+
+You can monitor your local installation by looking at:
+
+- http://localhost:5080/rabbitmq to access the rabbitmq dashboard (guest/guest),
+- http://localhost:5080/grafana to explore the platform's metrics (admin/admin),
+
+Shut down your local installation
+---------------------------------
+
+To shut down your SoftWare Heritage, just run::
+
+ ~/swh-environment/docker$ docker compose down
+
+Hacking the archive
+-------------------
-Such a setup is described in the
-:ref:`Developer Setup Guide <developer-setup>`.
+If you want to hack the code of the Software Heritage Archive, a more involved
+setup is required described in the :ref:`developer setup
+guide <developer-setup>`.
diff --git a/docs/getting-started/index.rst b/docs/getting-started/index.rst
--- a/docs/getting-started/index.rst
+++ b/docs/getting-started/index.rst
@@ -1,6 +1,17 @@
Getting started
===============
+Software Heritage has developed an infrastructure to collect, preserve,
+analyze and present software in source code form.
+
+You can :doc:`learn how to use its public API <api>`. Instead of using
+the public archive, you might want to :ref:`use a local installation
+<getting-started>` while developing clients.
+
+:ref:`Contributions <contributing>` to Software Heritage’s own source code
+are very much welcome. You will need to take the time to :ref:`setup a
+developer environment <developer-setup>` before you can start.
+
.. this index file is used to force a hierarchy in the toctree,
even though these two files are in the parent directory

File Metadata

Mime Type
text/plain
Expires
Mon, Apr 14, 7:03 AM (4 d, 21 h ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3231776

Event Timeline