diff --git a/docs/developer-setup.rst b/docs/developer-setup.rst --- a/docs/developer-setup.rst +++ b/docs/developer-setup.rst @@ -7,15 +7,15 @@ can easily navigate the source code, make modifications, write and execute unit tests. -For this, we will use a `virtualenv`_ in which all the |swh| packages will be +For this, we will use a `virtual environment`_ in which all the |swh| packages will be installed in 'develop' mode, this will allow you to navigate the source code, hack it, and run locally the unit tests. -If you want to test the effect of your modifications in a running |swh| -instance, you should check the swh-environment docker `documentation`_. +To test the effect of your modifications, you can :ref:`install your own local +Software Heritage instance ` using Docker. .. _`documentation`: https://forge.softwareheritage.org/source/swh-environment/browse/master/docker/README.rst?as=remarkup -.. _`virtualenv`: https://pypi.org/project/virtualenv/ +.. _`virtual environment`: https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/#creating-a-virtual-environment Install required dependencies @@ -44,13 +44,10 @@ This installs basic system utilities, Python library dependencies, development tools, documentation tools and our main database management systems. -Cassandra and postgresql will be started by tests when they need it, so you +Cassandra and PostgreSQL will be started by tests when they need it, so you don't need them started globally (this will save you some RAM):: - sudo systemctl stop postgresql - sudo systemctl disable postgresql - sudo systemctl stop cassandra - sudo systemctl disable cassandra + sudo systemctl disable --now cassandra postgresql If you intend to hack on the frontend part of |swh| Web Applications, you will also need to have nodejs >= 14 in your development environment. If the version in your @@ -256,20 +253,20 @@ The different databases for each subproject that requires one (like ``storage`` or ``scheduler``) should be setup through the ``swh db create`` or ``swh db init`` command. See their help for more information. +Test changes using a local instance +----------------------------------- + +How to test your changes with a local instance is explained in the :ref:`documentation about our Docker setup `. + + Sending your changes -------------------- After you are done making the changes you want, you can send them on our -forge_. The best way to do that is to use Arcanist, so we recommend you read -these two guides if you are not familiar with it: - -* https://wiki.softwareheritage.org/wiki/Arcanist_setup -* https://wiki.softwareheritage.org/wiki/Code_review_in_Phabricator - +forge. See the guide on :ref:`how to submit patches `. .. _pytest: https://pytest.org .. _tox: https://tox.readthedocs.io .. _pypi: https://pypi.org .. _swh-loader-git: https://forge.softwareheritage.org/source/swh-loader-git .. _pifpaf: https://github.com/jd/pifpaf -.. _forge: https://forge.softwareheritage.org/ diff --git a/docs/getting-started.rst b/docs/getting-started.rst --- a/docs/getting-started.rst +++ b/docs/getting-started.rst @@ -10,56 +10,171 @@ archive source code and browse it on the web. To that end, just follow the steps detailed below. - -Using Docker +.. warning:: + Running a Software Heritage instance on your machine can + consume quite a bit of resources: if you play a bit too hard (e.g., if + you try to list all GitHub repositories with the corresponding lister), + you may fill your hard drive, and consume a lot of CPU, memory and + network bandwidth. + +Dependencies ------------ -The easiest way to run a Software Heritage instance is to use Docker and -docker-compose. Please refer to the `docker-compose documentation -`_ if you do not have a working docker setup. +The easiest way to run a Software Heritage instance is to use Docker. +Please `ensure that you have a working recent installation first +`_ (including the +`Compose `_ plugin. -Then:: +Quick start +----------- - git clone https://forge.softwareheritage.org/source/swh-environment.git - cd swh-environment/docker - docker-compose up -d +First, retrieve Software Heritage development environment to get the +Docker configuration:: -When all the containers are up and running, you have a running Software -Heritage platform. You should open: + ~$ git clone https://forge.softwareheritage.org/source/swh-environment.git + ~$ cd swh-environment/docker -- http://localhost:5080/ to navigate your (empty for now) SWH archive, -- http://localhost:5080/rabbitmq to access the rabbitmq dashboard (guest/guest), -- http://localhost:5080/grafana to explore the platform's metrics (admin/admin), +Then, start containers:: + + ~/swh-environment/docker$ docker compose up -d + [...] + Creating docker_amqp_1 ... done + Creating docker_zookeeper_1 ... done + Creating docker_kafka_1 ... done + Creating docker_flower_1 ... done + Creating docker_swh-scheduler-db_1 ... done + [...] + +This will build Docker images and run them. Check everything is running +fine with:: + + ~/swh-environment/docker$ docker compose ps + Name Command State Ports + ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + docker_amqp_1 docker-entrypoint.sh rabbi ... Up 15671/tcp, 0.0.0.0:5018->15672/tcp, 25672/tcp, 4369/tcp, 5671/tcp, 5672/tcp + docker_flower_1 flower --broker=amqp://gue ... Up 0.0.0.0:5555->5555/tcp + docker_kafka_1 start-kafka.sh Up 0.0.0.0:5092->5092/tcp + docker_swh-deposit-db_1 docker-entrypoint.sh postgres Up 5432/tcp + docker_swh-deposit_1 /entrypoint.sh Up 0.0.0.0:5006->5006/tcp + [...] + +The startup of some containers may fail the first time for +dependency-related problems. If some containers failed to start, just +run the ``docker compose up -d`` command again. + +If a container really refuses to start properly, you can check why using +the ``docker compose logs`` command. For example:: + + ~/swh-environment/docker$ docker compose logs swh-lister + Attaching to docker_swh-lister_1 + [...] + swh-lister_1 | Processing /src/swh-scheduler + swh-lister_1 | Could not install packages due to an EnvironmentError: [('/src/swh-scheduler/.hypothesis/unicodedata/8.0.0/charmap.json.gz', '/tmp/pip-req-build-pm7nsax3/.hypothesis/unicodedata/8.0.0/charmap.json.gz', "[Errno 13] Permission denied: '/src/swh-scheduler/.hypothesis/unicodedata/8.0.0/charmap.json.gz'")] + swh-lister_1 | -All the internal APIs are also exposed: +.. note:: -- http://localhost:5080/scheduler -- http://localhost:5080/storage -- http://localhost:5080/indexer-storage -- http://localhost:5080/deposit -- http://localhost:5080/objstorage + For details on the various Docker images and how to work with them, + see the full :ref:`docker-environment` documentation. -At this point, the simplest way to start loading software is to use the 'Save -Code Now' feature of the archive web interface: +Once all containers are running, you can use the web interface by +opening http://localhost:5080/ in your web browser. + +At this point, the archive is empty and needs to be filled with some +content. The simplest way to start loading software is to use the +*Save Code Now* feature of the archive web interface: http://localhost:5080/browse/origin/save/ -For details and an overview of what you can do with your Software Heritage, -see the full :ref:`docker-environment` documentation. +You can also use the command line interface to inject code. For +example to retrieve projects hossted on the https://0xacab.org GitLab forge:: -Enjoy filling your hard drives! + ~/swh-environment/docker$ docker compose exec swh-scheduler \ + swh scheduler task add list-gitlab-full \ + -p oneshot url=https://0xacab.org/api/v4 + Created 1 tasks -Hacking the archive -------------------- + Task 1 + Next run: just now (2018-12-19 14:58:49+00:00) + Interval: 90 days, 0:00:00 + Type: list-gitlab-full + Policy: oneshot + Args: + Keyword args: + url=https://0xacab.org/api/v4 + +This task will scrape the forge’s project list and register origins to the scheduler. +This takes at most a couple of minutes. + +Then, you must tell the scheduler to create loading tasks for these origins. +For example, to create tasks for 100 of these origins:: + + ~/swh-environment/docker$ docker compose exec swh-scheduler \ + swh scheduler origin schedule-next git 100 + +This will take a bit of time to complete. + +To increase the speed at which git repositories are imported, you can +spawn more ``swh-loader-git`` workers:: -If you want to hack the code of the Software Heritage Archive, a bit more work -will be required. + ~/swh-environment/docker$ docker compose exec swh-scheduler \ + celery status + listers@50ac2185c6c9: OK + loader@b164f9055637: OK + indexer@33bc6067a5b8: OK + vault@c9fef1bbfdc1: OK -To be able to write patches, you will need a development setup. + 4 nodes online. + ~/swh-environment/docker$ docker compose exec swh-scheduler \ + celery control pool_grow 3 -d loader@b164f9055637 + -> loader@b164f9055637: OK + pool will grow + ~/swh-environment/docker$ docker compose exec swh-scheduler \ + celery inspect -d loader@b164f9055637 stats | grep prefetch_count + "prefetch_count": 4 -The best way to have a development-friendly environment is to build a mixed -docker/virtualenv setup. +Now there are 4 workers ingesting git repositories. You can also +increase the number of ``swh-loader-git`` containers:: + + ~/swh-environment/docker$ docker compose up -d --scale swh-loader=4 + [...] + Creating docker_swh-loader_2 ... done + Creating docker_swh-loader_3 ... done + Creating docker_swh-loader_4 ... done + + +Updating the docker image +------------------------- + +All containers started by ``docker compose`` are bound to a docker image +named ``swh/stack`` including all the software components of Software +Heritage. When new versions of these components are released, the docker +image will not be automatically updated. In order to update all Software +Heritage components to their latest version, the docker image needs to +be explicitly rebuilt by issuing the following command from within the +``docker`` directory:: + + ~/swh-environment/docker$ docker build --no-cache -t swh/stack . + +Monitor your local installation +------------------------------- + +You can monitor your local installation by looking at: + +- http://localhost:5080/rabbitmq to access the rabbitmq dashboard (guest/guest), +- http://localhost:5080/grafana to explore the platform's metrics (admin/admin), + +Shut down your local installation +--------------------------------- + +To shut down your SoftWare Heritage, just run:: + + ~/swh-environment/docker$ docker compose down + +Hacking the archive +------------------- -Such a setup is described in the -:ref:`Developer Setup Guide `. +If you want to hack the code of the Software Heritage Archive, a more involved +setup is required described in the :ref:`developer setup +guide `. diff --git a/docs/getting-started/index.rst b/docs/getting-started/index.rst --- a/docs/getting-started/index.rst +++ b/docs/getting-started/index.rst @@ -1,6 +1,17 @@ Getting started =============== +Software Heritage has developed an infrastructure to collect, preserve, +analyze and present software in source code form. + +You can :doc:`learn how to use its public API `. Instead of using +the public archive, you might want to :ref:`use a local installation +` while developing clients. + +:ref:`Contributions ` to Software Heritage’s own source code +are very much welcome. You will need to take the time to :ref:`setup a +developer environment ` before you can start. + .. this index file is used to force a hierarchy in the toctree, even though these two files are in the parent directory