Page MenuHomeSoftware Heritage

D1220.diff
No OneTemporary

D1220.diff

diff --git a/docs/developer-setup.rst b/docs/developer-setup.rst
--- a/docs/developer-setup.rst
+++ b/docs/developer-setup.rst
@@ -3,20 +3,35 @@
Developer setup
===============
-In this guide, we will set up a dual environment:
+In this guide we describe how to set up a developer environment in which one
+can easily navigate the source code, make modifications, write and execute unit
+tests.
-- A virtual env in which all the |swh| packages will be installed in 'develop'
- mode, this will allow you to navigate the source code, hack it, and run
- locally the unit tests.
+For this, we will use a `virtualenv`_ in which all the |swh| packages will be
+installed in 'develop' mode, this will allow you to navigate the source code,
+hack it, and run locally the unit tests.
-- A docker 'cluster' built with docker-compose, which allows to easily run all
- the components of the |swh| architecture. It is possible to run those docker
- containers with your locally modified code for one or several |swh| packages.
+If you want to test the effect of your modifications in a running |swh|
+instance, you should check the `documentation`_ of the swh-docker-dev_ project.
- Please read the `README file`_ in the swh-docker-dev repository for more
- details on how to do this.
+.. _`documentation`: https://forge.softwareheritage.org/source/swh-docker-dev/browse/master/README.md?as=remarkup
+.. _`swh-docker-dev`: https://forge.softwareheritage.org/source/swh-docker-dev
+.. _`virtualenv`: https://pypi.org/project/virtualenv/
+
+
+Install required dependencies
+-----------------------------
+
+Software Heritage requires some dependencies that are usually packaged by your
+package manager. On Debian/Ubuntu-based distributions::
+
+ sudo wget https://www.postgresql.org/media/keys/ACCC4CF8.asc -O /etc/apt/trusted.gpg.d/postgresql.asc
+ sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
+ sudo apt update
+ sudo apt install python3 python3-venv libsvn-dev postgresql-11 \
+ libsystemd-dev libpython3-dev graphviz postgresql-autodoc \
+ postgresql-server-dev-all virtualenvwrapper git build-essential
-.. _`README file`: https://forge.softwareheritage.org/source/swh-docker-dev/browse/master/README.md
Checkout the source code
------------------------
@@ -28,19 +43,21 @@
~$ cd swh-environment
~/swh-environment$
-Create a virtual env::
+Checkout all the swh packages source repositories::
+
+ ~/swh-environment$ ./bin/update
+
+Create a virtualenv::
~/swh-environment$ mkvirtualenv -p /usr/bin/python3 -a $PWD swh
[...]
(swh) ~/swh-environment$
-
-.. Note: using virtualenvwrapper_ is not mandatory here. You can use plain
+.. Note:: using virtualenvwrapper_ is not mandatory here. You can use plain
virtualenvs, or any other venv management tool (pipenv_ or poetry_
for example). Using a tool such as virtualenvwrapper_ just makes life
easier...
-
.. _virtualenvwrapper: https://virtualenvwrapper.readthedocs.io/
.. _poetry: https://poetry.eustace.io/
.. _pipenv: https://pipenv.readthedocs.io/
@@ -53,110 +70,131 @@
[...]
-Setup the docker environment
-----------------------------
-
-Install docker-compose::
-
- (swh) ~/swh-environment$ pip install docker-compose
- [...]
-
-Make your life easier::
-
- (swh) ~/swh-environment$ cat >>$VIRTUAL_ENV/bin/postactivate <<EOF
- # unfortunately, the interface cmd for the click autocompletion
- # depends on the shell
- # https://click.palletsprojects.com/en/7.x/bashcomplete/#activation
-
- shell=$(basename $SHELL)
- case "$shell" in
- "zsh")
- autocomplete_cmd=source_zsh
- ;;
- *)
- autocomplete_cmd=source
- ;;
- esac
-
- eval "$(_SWH_SCHEDULER_COMPLETE=$autocomplete_cmd swh-scheduler)"
- export SWH_SCHEDULER_URL=http://127.0.0.1:5008/
- export CELERY_BROKER_URL=amqp://127.0.0.1:5072/
- export COMPOSE_FILE=~/swh-environment/swh-docker-dev/docker-compose.yml:~/swh-environment/swh-docker-dev/docker-compose.override.yml
- alias doco=docker-compose
-
- function swhclean {
- find ~/swh-environment -type d -name __pycache__ -exec rm -rf {} \;
- find ~/swh-environment -type d -name .tox -exec rm -rf {} \;
- find ~/swh-environment -type d -name .hypothesis -exec rm -rf {} \;
- }
- EOF
-
-This postactivate script does:
-
-- install a shell completion handler for the swh-scheduler command,
-- preset a bunch of environment variables
-
- - `SWH_SCHEDULER_URL` so that you can just run `sch-scheduler` against the
- scheduler API instance running in docker, without having to specify the
- endpoint URL,
-
- - `CELERY_BROKER` so you can execute the `celery` tool without options
- against the rabbitmq server running in the docker environment,
-
- - `COMPOSE_FILE` so you can run `docker-compose` from everywhere,
-
-- create an alias `doco` for `docker-compose` because this later is way too
- long to type,
-
-- add a `swhclean` shell function to clean your source directories so that
- there is no conflict with docker containers using local swh repositories (see
- below). This will delete any `.tox`, `__pycache__` and `.hypothesis`
- directory found in your swh-environment directory.
-
-
-Start the SWH platform::
-
- (swh) ~/swh-environment$ docker-compose up -d
- [...]
-
-Check celery::
-
- (swh) ~/swh-environment$ celery status
- listers@50ac2185c6c9: OK
- loader@b164f9055637: OK
- indexer@33bc6067a5b8: OK
-
-List task-types::
-
- (swh) ~/swh-environment$ swh-scheduler task-type list
- [...]
-
-Get more info on a task type::
-
- (swh) ~/swh-environment$ swh-scheduler task-type list -v -t origin-update-hg
- Known task types:
- origin-update-hg: swh.loader.mercurial.tasks.LoadMercurial
- Loading mercurial repository swh-loader-mercurial
- interval: 1 day, 0:00:00 [1 day, 0:00:00, 1 day, 0:00:00]
- backoff_factor: 1.0
- max_queue_length: 1000
- num_retries: None
- retry_delay: None
-
-Add a new task::
-
- (swh) ~/swh-environment$ swh-scheduler task add origin-update-hg origin_url=https://hg.logilab.org/master/cubicweb
- Created 1 tasks
-
- Task 1
- Next run: just now (2019-02-06 12:36:58+00:00)
- Interval: 1 day, 0:00:00
- Type: origin-update-hg
- Policy: recurring
- Args:
- Keyword args:
- origin_url: https://hg.logilab.org/master/cubicweb
-
-Respawn a task::
-
- (swh) ~/swh-environment$ swh-scheduler task respawn 1
+Executing unit tests
+--------------------
+
+Unit tests are using the pytest_ framework, and can be executed directly or via
+tox_. The main difference between these 2 test execution environments is:
+
+- When executed via tox_, all the dependencies (including swh ones) are
+ installed from pypi_: you test yout modifications against the latest
+ published version of every swh package but the current one.
+
+- When you execute pytest_ directly, swh dependencies are used from your
+ current virtualenv, installed from the git repositories: you test your
+ modification against the HEAD of every swh package.
+
+For example, running unit tests for the swh-loader-git_ package::
+
+ (swh) ~/swh-environment$ cd swh-loader-git
+ (swh) ~/swh-environment/swh-loader-git$ pytest
+ =========================== test session starts ============================
+ platform linux -- Python 3.5.3, pytest-3.8.2, py-1.6.0, pluggy-0.7.1
+ hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/home/ddouard/src/swh-environment/swh-loader-git/.hypothesis/examples')
+ rootdir: /home/ddouard/src/swh-environment/swh-loader-git, inifile: pytest.ini
+ plugins: requests-mock-1.5.2, postgresql-1.3.4, env-0.6.2, django-3.4.7, cov-2.6.0, pylama-7.6.5, hypothesis-3.76.0, celery-4.2.1
+ collected 25 items
+
+ swh/loader/git/tests/test_converters.py ........ [ 32%]
+ swh/loader/git/tests/test_from_disk.py ..... [ 52%]
+ swh/loader/git/tests/test_loader.py ...... [ 76%]
+ swh/loader/git/tests/test_tasks.py ... [ 88%]
+ swh/loader/git/tests/test_utils.py ... [100%]
+ ============================= warnings summary =============================
+ [...]
+ ================== 25 passed, 12 warnings in 6.66 seconds ==================
+
+Running the same test, plus flake8 checks, using tox::
+
+ (swh) ~/swh-environment/swh-loader-git$ tox
+ GLOB sdist-make: ~/swh-environment/swh-loader-git/setup.py
+ flake8 create: ~/swh-environment/swh-loader-git/.tox/flake8
+ flake8 installdeps: flake8
+ flake8 installed: entrypoints==0.3,flake8==3.7.7,mccabe==0.6.1,pycodestyle==2.5.0,pyflakes==2.1.1,swh.loader.git==0.0.48.post3
+ flake8 run-test-pre: PYTHONHASHSEED='2028963506'
+ flake8 runtests: commands[0] | ~/swh-environment/swh-loader-git/.tox/flake8/bin/python -m flake8
+ py3 create: ~/swh-environment/swh-loader-git/.tox/py3
+ py3 installdeps: .[testing], pytest-cov
+ py3 inst: ~/swh-environment/swh-loader-git/.tox/.tmp/package/1/swh.loader.git-0.0.48.post3.zip
+ py3 installed: aiohttp==3.5.4,amqp==2.4.2,arrow==0.13.1,async-timeout==3.0.1,atomicwrites==1.3.0,attrs==19.1.0,billiard==3.5.0.5,celery==4.2.1,certifi==2018.11.29,chardet==3.0.4,Click==7.0,coverage==4.5.2,decorator==4.3.2,dulwich==0.19.11,elasticsearch==6.3.1,Flask==1.0.2,idna==2.8,idna-ssl==1.1.0,itsdangerous==1.1.0,Jinja2==2.10,kombu==4.4.0,MarkupSafe==1.1.1,more-itertools==6.0.0,msgpack-python==0.5.6,multidict==4.5.2,pathlib2==2.3.3,pluggy==0.9.0,psutil==5.6.0,psycopg2==2.7.7,py==1.8.0,pytest==3.10.1,pytest-cov==2.6.1,python-dateutil==2.8.0,pytz==2018.9,PyYAML==3.13,requests==2.21.0,retrying==1.3.3,six==1.12.0,swh.core==0.0.55,swh.loader.core==0.0.39,swh.loader.git==0.0.48.post3,swh.model==0.0.30,swh.objstorage==0.0.30,swh.scheduler==0.0.49,swh.storage==0.0.129,systemd-python==234,typing-extensions==3.7.2,urllib3==1.24.1,vcversioner==2.16.0.0,vine==1.2.0,Werkzeug==0.14.1,yarl==1.3.0
+ py3 run-test-pre: PYTHONHASHSEED='2028963506'
+ py3 runtests: commands[0] | pytest --cov=swh --cov-branch
+ =========================== test session starts ============================
+ platform linux -- Python 3.5.3, pytest-3.10.1, py-1.8.0, pluggy-0.9.0
+ rootdir: ~/swh-environment/swh-loader-git, inifile: pytest.ini
+ plugins: cov-2.6.1, celery-4.2.1
+ collected 25 items
+
+ swh/loader/git/tests/test_converters.py ........ [ 32%]
+ swh/loader/git/tests/test_from_disk.py ..... [ 52%]
+ swh/loader/git/tests/test_loader.py ...... [ 76%]
+ swh/loader/git/tests/test_tasks.py ... [ 88%]
+ swh/loader/git/tests/test_utils.py ... [100%]
+
+ ----------- coverage: platform linux, python 3.5.3-final-0 -----------
+ Name Stmts Miss Branch BrPart Cover
+ ---------------------------------------------------------------------------
+ swh/__init__.py 1 0 0 0 100%
+ swh/loader/__init__.py 1 0 0 0 100%
+ swh/loader/git/__init__.py 0 0 0 0 100%
+ swh/loader/git/converters.py 102 10 44 7 86%
+ swh/loader/git/from_disk.py 157 44 50 6 67%
+ swh/loader/git/loader.py 271 59 114 17 75%
+ swh/loader/git/tasks.py 14 0 0 0 100%
+ swh/loader/git/tests/__init__.py 1 0 0 0 100%
+ swh/loader/git/tests/conftest.py 4 0 0 0 100%
+ swh/loader/git/tests/test_converters.py 94 0 6 0 100%
+ swh/loader/git/tests/test_from_disk.py 100 4 0 0 96%
+ swh/loader/git/tests/test_loader.py 12 0 0 0 100%
+ swh/loader/git/tests/test_tasks.py 26 0 0 0 100%
+ swh/loader/git/tests/test_utils.py 14 0 2 0 100%
+ swh/loader/git/utils.py 25 8 8 1 61%
+ ---------------------------------------------------------------------------
+ TOTAL 822 125 224 31 80%
+
+
+ ============================= warnings summary =============================
+ .tox/py3/lib/python3.5/site-packages/psycopg2/__init__.py:144
+ ~/swh-environment/swh-loader-git/.tox/py3/lib/python3.5/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
+ """)
+
+ -- Docs: https://docs.pytest.org/en/latest/warnings.html
+ ================== 25 passed, 1 warnings in 7.34 seconds ===================
+ _________________________________ summary __________________________________
+ flake8: commands succeeded
+ py3: commands succeeded
+ congratulations :)
+
+Beware that some swh packages require a postgresql server properly configured
+to execute the tests. In this case, you will want to use pifpaf_, which will
+spawn a temporary instance of postgresql, to encapsulate the call to pytest.
+For example, running pytest in the swh-core package::
+
+ (swh) ~/swh-environment$ cd swh-core
+ (swh) ~/swh-environment/swh-core$ pifpaf run postgresql -- pytest
+ =========================== test session starts ============================
+ platform linux -- Python 3.5.3, pytest-3.8.2, py-1.6.0, pluggy-0.7.1
+ hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/home/ddouard/src/swh-environment/swh-core/.hypothesis/examples')
+ rootdir: /home/ddouard/src/swh-environment/swh-core, inifile: pytest.ini
+ plugins: requests-mock-1.5.2, postgresql-1.3.4, env-0.6.2, django-3.4.7, cov-2.6.0, pylama-7.6.5, hypothesis-3.76.0, celery-4.2.1
+ collected 79 items
+
+ swh/core/tests/test_api.py .. [ 2%]
+ swh/core/tests/test_config.py .............. [ 20%]
+ swh/core/tests/test_db.py .... [ 25%]
+ swh/core/tests/test_logger.py . [ 26%]
+ swh/core/tests/test_serializers.py ..... [ 32%]
+ swh/core/tests/test_statsd.py ...................................... [ 81%]
+ ........ [ 91%]
+ swh/core/tests/test_utils.py ....... [100%]
+
+ ======================== 79 passed in 6.59 seconds =========================
+
+
+
+
+.. _pytest: https://pytest.org
+.. _tox: https://tox.readthedocs.io
+.. _pypi: https://pypi.org
+.. _swh-loader-git: https://forge.softwareheritage.org/source/swh-loader-git
+.. _pifpaf: https://github.com/jd/pifpaf
diff --git a/docs/getting-started.rst b/docs/getting-started.rst
--- a/docs/getting-started.rst
+++ b/docs/getting-started.rst
@@ -29,7 +29,7 @@
- http://localhost:5080/ to navigate your (empty for now) SWH archive,
- http://localhost:5080/rabbitmq to access the rabbitmq dashboard (guest/guest),
-- http://localhost:5080/prometheus to explore the platform's metrics,
+- http://localhost:5080/grafana to explore the platform's metrics,
All the internal APIs are also exposed:
@@ -53,15 +53,10 @@
If you want to hack the code of the Software Heritage Archive, a bit more work
will be required.
-The best way to have a development-friendly environment is to build a mixed
-docker/virtual env setup.
-
-Such a setup is described in the :ref:`Perfect Developer Setup guide
-<developer-setup>`.
+To be able to write patches, you will need a development setup.
+The best way to have a development-friendly environment is to build a mixed
+docker/virtualenv setup.
-Installing from sources (without Docker)
-++++++++++++++++++++++++++++++++++++++++
-
-If you prefer to run everything straight, you should refer to the :ref:`Manual
-Setup Guide <manual-setup>`
+Such a setup is described in the
+:ref:`Developer Setup Guide <developer-setup>`.
diff --git a/docs/manual-setup.rst b/docs/manual-setup.rst
deleted file mode 100644
--- a/docs/manual-setup.rst
+++ /dev/null
@@ -1,227 +0,0 @@
-.. _manual-setup:
-
-Step 0 --- get the code
------------------------
-
-The `swh-environment
-<https://forge.softwareheritage.org/source/swh-environment/>`_ Git (meta)
-repository orchestrates the Git repositories of all Software Heritage modules.
-Clone it::
-
- git clone https://forge.softwareheritage.org/source/swh-environment.git
-
-then recursively clone all Python module repositories. For this step you will
-need the `mr <http://myrepos.branchable.com/>`_ tool. Once you have installed
-``mr``, just run::
-
- cd swh-environment
- bin/update
-
-.. IMPORTANT::
-
- From now on this tutorial will assume that you **run commands listed below
- from within the swh-environment** directory.
-
-For periodic repository updates just re-run ``bin/update``.
-
-
-Step 1 --- install system dependencies
---------------------------------------
-
-You need to install three types of dependencies: some base packages, Node.js
-modules (for the web app), and Postgres (as storage backend).
-
-Package dependencies
-~~~~~~~~~~~~~~~~~~~~
-
-Software Heritage requires some dependencies that are usually packaged by your
-package manager. On Debian/Ubuntu-based distributions::
-
- sudo apt-get install curl ca-certificates
- curl https://deb.nodesource.com/setup_8.x | sudo bash
- curl https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
- sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
- sudo apt update
- sudo apt install python3 python3-venv libsvn-dev postgresql-10 nodejs \
- libsystemd-dev libpython3-dev dia postgresql-autodoc \
- postgresql-server-dev-all
-
-Postgres
-~~~~~~~~
-
-You need a running Postgres instance with administrator access (e.g., to create
-databases). On Debian/Ubuntu based distributions, the previous step
-(installation) should be enough.
-
-For other platforms and more details refer to the `PostgreSQL installation
-documentation
-<https://www.postgresql.org/docs/current/static/tutorial-install.html>`_.
-
-You also need to have access to a superuser account on the database. For that,
-the easiest way is to create a PostgreSQL account that has the same name as
-your username::
-
- sudo -u postgres createuser --createdb --superuser $USER
-
-You can check that this worked by doing, from your user (you should not be
-asked for a password)::
-
- psql postgres
-
-Node.js modules
-~~~~~~~~~~~~~~~
-
-If you want to run the web app to browser your local archive you will need some
-Node.js modules, in particular to pack web resources into a single compact
-file. To that end the following should suffice::
-
- cd swh-web
- npm install
- cd -
-
-You are now good to go with all needed dependencies on your development
-machine!
-
-
-Step 2 --- install Python packages in a virtualenv
---------------------------------------------------
-
-From now on you will need to work in a `virtualenv
-<https://docs.python.org/3/library/venv.html>`_ containing the Python
-environment with all the Software Heritage modules and their dependencies. To
-that end you can do (once)::
-
- python3 -m venv .venv
-
-Then, activate the virtualenv (do this every time you start working on Software
-Heritage)::
-
- source .venv/bin/activate
-
-You can now install Software Heritage Python modules, their dependencies and
-the testing-related dependencies using::
-
- pip install $( bin/pip-swh-packages --with-testing )
-
-
-Step 3 --- set up storage
--------------------------
-
-Then you will need a local storage service that will archive and serve source
-code artifacts via a REST API. The Software Heritage storage layer comes in two
-parts: a content-addressable :term:`object storage` on your file system (for file
-contents) and a Postgres database (for the graph structure of the archive). See
-the :ref:`data-model` for more information. The storage layer is configured via
-a YAML configuration file, located at
-``~/.config/swh/storage/storage.yml``. Create it with a content like:
-
-.. code-block:: yaml
-
- storage:
- cls: local
- args:
- db: "dbname=softwareheritage-dev"
- objstorage:
- cls: pathslicing
- args:
- root: /srv/softwareheritage/objects/
- slicing: 0:2/2:4
-
-Make sure that the :term:`object storage` root exists on the filesystem and is writable
-to your user, e.g.::
-
- sudo mkdir -p /srv/softwareheritage/objects
- sudo chown "${USER}:" /srv/softwareheritage/objects
-
-You are done with :term:`object storage` setup! Let's setup the database::
-
- swh-db-init storage -d softwareheritage-dev
-
-``softwareheritage-dev`` is the name of the DB that will be created, it should
-match the ``db`` line in ``storage.yml``
-
-To check that you can successfully connect to the DB (you should not be asked
-for a password)::
-
- psql softwareheritage-dev
-
-You can now run the storage server like this::
-
- python3 -m swh.storage.api.server --host localhost --port 5002 ~/.config/swh/storage/storage.yml
-
-
-Step 4 --- ingest repositories
-------------------------------
-
-You are now ready to ingest your first repository into your local Software
-Heritage. For the sake of example, we will ingest a few Git repositories. The
-module in charge of ingesting Git repositories is the *Git loader*, Python
-module ``swh.loader.git``. Its configuration file is at
-``~/.config/swh/loader/git.yml``. Create it with a content like:
-
-.. code-block:: yaml
-
- storage:
- cls: remote
- args:
- url: http://localhost:5002
-
-It just informs the Git loader to use the storage server running on your
-machine. The ``url`` line should match the command line used to run the storage
-server.
-
-You can now ingest Git repository on the command line using the command::
-
- python3 -m swh.loader.git.loader --origin-url GIT_CLONE_URL
-
-For instance, you can try ingesting the following repositories, in increasing
-size order (note that the last two might take a few hours to complete and will
-occupy several GB on both the Postgres DB and the object storage)::
-
- python3 -m swh.loader.git.loader --origin-url https://github.com/SoftwareHeritage/swh-storage.git
- python3 -m swh.loader.git.loader --origin-url https://github.com/hylang/hy.git
- python3 -m swh.loader.git.loader --origin-url https://github.com/ocaml/ocaml.git
-
- # WARNING: next repo is big
- python3 -m swh.loader.git.loader --origin-url https://github.com/torvalds/linux.git
-
-Congratulations, you have just archived your first source code repositories!
-
-To re-archive the same repositories later on you can rerun the same commands:
-only *new* objects added since the previous visit will be archived upon the
-next one.
-
-
-Step 5 --- browse the archive
------------------------------
-
-You can now setup a local web app to browse what you have locally archived. The
-web app uses the configuration file ``~/.config/swh/web/web.yml``. Create it
-and fill it with something like:
-
-.. code-block:: yaml
-
- storage:
- cls: remote
- args:
- url: http://localhost:5002
-
-Nothing new here, the configuration just references the local storage server,
-which have been used before for repository ingestion.
-
-You can now run the web app, and browse your local archive::
-
- make run-django-webpack-devserver
- xdg-open http://localhost:5004
-
-Note that the ``make`` target will first compile a `webpack
-<https://webpack.js.org/>`_ with various web assets and then launch the web app;
-for webpack compilation you will need the Node.js dependencies discussed above.
-
-As an initial tour of the web app, try searching for one of the repositories
-you have ingested (e.g., entering the ``hylang`` or ``ocaml`` keywords in the
-search bar). Clicking on the repository name you will be brought back in time,
-and you will be able to browse the source code and development history you have
-archived.
-
-Enjoy!

File Metadata

Mime Type
text/plain
Expires
Thu, Jul 3, 1:39 PM (5 d, 10 h ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3221079

Event Timeline