Changeset View
Changeset View
Standalone View
Standalone View
docker/README.rst
Show First 20 Lines • Show All 621 Lines • ▼ Show 20 Lines | ||||||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |||||||||
This repo comes with an optional ``docker-compose.storage-mirror.yml`` | This repo comes with an optional ``docker-compose.storage-mirror.yml`` | |||||||||
docker compose file that can be used to test the kafka-powered mirror | docker compose file that can be used to test the kafka-powered mirror | |||||||||
mechanism for the main storage. | mechanism for the main storage. | |||||||||
This can be used like:: | This can be used like:: | |||||||||
~/swh-environment/docker$ docker-compose -f docker-compose.yml -f docker-compose.storage-mirror.yml up -d | ~/swh-environment/docker$ docker-compose \ | |||||||||
-f docker-compose.yml \ | ||||||||||
-f docker-compose.storage-mirror.yml \ | ||||||||||
up -d | ||||||||||
[...] | [...] | |||||||||
Compared to the original compose file, this will: | Compared to the original compose file, this will: | |||||||||
- overrides the swh-storage service to activate the kafka direct writer | - overrides the swh-storage service to activate the kafka direct writer | |||||||||
on swh.journal.objects prefixed topics using the swh.storage.master | on swh.journal.objects prefixed topics using the swh.storage.master | |||||||||
ID, | ID, | |||||||||
- overrides the swh-web service to make it use the mirror instead of | - overrides the swh-web service to make it use the mirror instead of | |||||||||
Show All 14 Lines | ||||||||||
Starting the backfiller | Starting the backfiller | |||||||||
""""""""""""""""""""""" | """"""""""""""""""""""" | |||||||||
Reading from the storage the objects from within range [start-object, | Reading from the storage the objects from within range [start-object, | |||||||||
end-object] to the kafka topics. | end-object] to the kafka topics. | |||||||||
:: | :: | |||||||||
(swh)$ docker-compose \ | ~/swh-environment/docker$ docker-compose \ | |||||||||
-f docker-compose.yml \ | -f docker-compose.yml \ | |||||||||
-f docker-compose.storage-mirror.yml \ | -f docker-compose.storage-mirror.yml \ | |||||||||
-f docker-compose.storage-mirror.override.yml \ | -f docker-compose.storage-mirror.override.yml \ | |||||||||
run \ | run \ | |||||||||
swh-journal-backfiller \ | swh-journal-backfiller \ | |||||||||
snapshot \ | snapshot \ | |||||||||
--start-object 000000 \ | --start-object 000000 \ | |||||||||
--end-object 000001 \ | --end-object 000001 \ | |||||||||
--dry-run | --dry-run | |||||||||
Cassandra | Cassandra | |||||||||
^^^^^^^^^ | ^^^^^^^^^ | |||||||||
We are working on an alternative backend for swh-storage, based on Cassandra | We are working on an alternative backend for swh-storage, based on Cassandra | |||||||||
instead of PostgreSQL. | instead of PostgreSQL. | |||||||||
This can be used like:: | This can be used like:: | |||||||||
~/swh-environment/docker$ docker-compose -f docker-compose.yml -f docker-compose.cassandra.yml up -d | ~/swh-environment/docker$ docker-compose \ | |||||||||
-f docker-compose.yml \ | ||||||||||
-f docker-compose.cassandra.yml \ | ||||||||||
ardumont: indent is off. | ||||||||||
up -d | ||||||||||
vlorentzUnsubmitted Not Done Inline Actionsindent vlorentz: indent | ||||||||||
[...] | [...] | |||||||||
This launches two Cassandra servers, and reconfigures swh-storage to use them. | This launches two Cassandra servers, and reconfigures swh-storage to use them. | |||||||||
Efficient origin search | Efficient origin search | |||||||||
^^^^^^^^^^^^^^^^^^^^^^^ | ^^^^^^^^^^^^^^^^^^^^^^^ | |||||||||
By default, swh-web uses swh-storage and swh-indexer-storage to provide its | By default, swh-web uses swh-storage and swh-indexer-storage to provide its | |||||||||
search bar. They are both based on PostgreSQL and rather inefficient | search bar. They are both based on PostgreSQL and rather inefficient | |||||||||
(or Cassandra, which is even slower). | (or Cassandra, which is even slower). | |||||||||
Instead, you can enable swh-search, which is based on ElasticSearch | Instead, you can enable swh-search, which is based on ElasticSearch | |||||||||
and much more efficient, like this:: | and much more efficient, like this:: | |||||||||
~/swh-environment/docker$ docker-compose -f docker-compose.yml -f docker-compose.search.yml up -d | ~/swh-environment/docker$ docker-compose \ | |||||||||
-f docker-compose.yml \ | ||||||||||
-f docker-compose.search.yml \ | ||||||||||
up -d | ||||||||||
[...] | [...] | |||||||||
Efficient counters | Efficient counters | |||||||||
^^^^^^^^^^^^^^^^^^ | ^^^^^^^^^^^^^^^^^^ | |||||||||
The web interface shows counters of the number of objects in your archive, | The web interface shows counters of the number of objects in your archive, | |||||||||
by counting objects in the PostgreSQL or Cassandra database. | by counting objects in the PostgreSQL or Cassandra database. | |||||||||
While this should not be an issue at the scale of your local Docker instance, | While this should not be an issue at the scale of your local Docker instance, | |||||||||
counting objects can actually be a bottleneck at Software Heritage's scale. | counting objects can actually be a bottleneck at Software Heritage's scale. | |||||||||
So swh-storage uses heuristics, that can be either not very efficient | So swh-storage uses heuristics, that can be either not very efficient | |||||||||
or inaccurate. | or inaccurate. | |||||||||
So we have an alternative based on Redis' HyperLogLog feature, which you | So we have an alternative based on Redis' HyperLogLog feature, which you | |||||||||
can test with:: | can test with:: | |||||||||
~/swh-environment/docker$ docker-compose -f docker-compose.yml -f docker-compose.counters.yml up -d | ~/swh-environment/docker$ docker-compose \ | |||||||||
-f docker-compose.yml \ | ||||||||||
-f docker-compose.counters.yml \ | ||||||||||
up -d | ||||||||||
[...] | [...] | |||||||||
Efficient graph traversals | Efficient graph traversals | |||||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^ | ^^^^^^^^^^^^^^^^^^^^^^^^^^ | |||||||||
:ref:`swh-graph <swh-graph>` is a work-in-progress alternative to swh-storage | :ref:`swh-graph <swh-graph>` is a work-in-progress alternative to swh-storage | |||||||||
to perform large graph traversals/queries on the merkle DAG. | to perform large graph traversals/queries on the merkle DAG. | |||||||||
For example, it can be used by the vault, as it needs to query all objects | For example, it can be used by the vault, as it needs to query all objects | |||||||||
in the sub-DAG of a given node. | in the sub-DAG of a given node. | |||||||||
You can use it with:: | You can use it with:: | |||||||||
~/swh-environment/docker$ docker-compose -f docker-compose.yml -f docker-compose.graph.yml up -d | ~/swh-environment/docker$ docker-compose \ | |||||||||
-f docker-compose.yml \ | ||||||||||
-f docker-compose.graph.yml up -d | ||||||||||
On the first start, it will run some precomputation based on all objects already | On the first start, it will run some precomputation based on all objects already | |||||||||
in your local SWH instance; so it may take a long time if you loaded many | in your local SWH instance; so it may take a long time if you loaded many | |||||||||
repositories. (Expect 5 to 10s per repository.) | repositories. (Expect 5 to 10s per repository.) | |||||||||
It **does not update automatically** when you load new repositories. | It **does not update automatically** when you load new repositories. | |||||||||
You need to restart it every time you want to update it. | You need to restart it every time you want to update it. | |||||||||
You can :ref:`mount a docker volume <docker-persistence>` on | You can :ref:`mount a docker volume <docker-persistence>` on | |||||||||
:file:`/srv/softwareheritage/graph` to avoid recomputing this graph | :file:`/srv/softwareheritage/graph` to avoid recomputing this graph | |||||||||
on every start. | on every start. | |||||||||
Then, you need to explicitly request recomputing the graph before restarts | Then, you need to explicitly request recomputing the graph before restarts | |||||||||
if you want to update it:: | if you want to update it:: | |||||||||
~/swh-environment/docker$ docker-compose -f docker-compose.yml -f docker-compose.graph.yml run swh-graph update | ~/swh-environment/docker$ docker-compose \ | |||||||||
~/swh-environment/docker$ docker-compose -f docker-compose.yml -f docker-compose.graph.yml stop swh-graph | -f docker-compose.yml \ | |||||||||
~/swh-environment/docker$ docker-compose -f docker-compose.yml -f docker-compose.graph.yml up swh-graph -d | -f docker-compose.graph.yml \ | |||||||||
run swh-graph update | ||||||||||
~/swh-environment/docker$ docker-compose \ | ||||||||||
-f docker-compose.yml \ | ||||||||||
-f docker-compose.graph.yml \ | ||||||||||
stop swh-graph | ||||||||||
~/swh-environment/docker$ docker-compose \ | ||||||||||
-f docker-compose.yml \ | ||||||||||
-f docker-compose.graph.yml \ | ||||||||||
up -d swh-graph | ||||||||||
Keycloak | Keycloak | |||||||||
^^^^^^^^ | ^^^^^^^^ | |||||||||
If you really want to hack on swh-web's authentication features, | If you really want to hack on swh-web's authentication features, | |||||||||
you will need to enable Keycloak as well, instead of the default | you will need to enable Keycloak as well, instead of the default | |||||||||
Django-based authentication:: | Django-based authentication:: | |||||||||
~/swh-environment/docker$ docker-compose -f docker-compose.yml -f docker-compose.keycloak.yml up -d | ~/swh-environment/docker$ docker-compose -f docker-compose.yml -f docker-compose.keycloak.yml up -d | |||||||||
[...] | [...] | |||||||||
User registration in Keycloak database is available by following the Register link | User registration in Keycloak database is available by following the Register link | |||||||||
in the page located at http://localhost:5080/oidc/login/. | in the page located at http://localhost:5080/oidc/login/. | |||||||||
Please note that email verification is required to properly register an account. | Please note that email verification is required to properly register an account. | |||||||||
As we are in a testing environment, we use a MailHog instance as a fake SMTP server. | As we are in a testing environment, we use a MailHog instance as a fake SMTP server. | |||||||||
All emails sent by Keycloak can be easily read from the MailHog Web UI located | All emails sent by Keycloak can be easily read from the MailHog Web UI located | |||||||||
at http://localhost:8025/. | at http://localhost:8025/. | |||||||||
Kafka | ||||||||||
^^^^^ | ||||||||||
Consuming topics from the host | ||||||||||
"""""""""""""""""""""""""""""" | ||||||||||
As mentioned above, it is possoble to consume topics from the kafka server available | ||||||||||
ardumontUnsubmitted Not Done Inline Actions
ardumont: | ||||||||||
in the docker-compose environment from the host using `127.0.0.1:5092` as broker URL. | ||||||||||
Resetting offsets | ||||||||||
""""""""""""""""" | ||||||||||
It is also possible to reset a consumer group offset using the following command:: | ||||||||||
~swh-environment/docker$ docker-compose \ | ||||||||||
run kafka kafka-consumer-groups.sh \ | ||||||||||
--bootstrap-server kafka:9092 \ | ||||||||||
--group <group> \ | ||||||||||
--all-topics \ | ||||||||||
--reset-offsets --to-earliest --execute | ||||||||||
[...] | ||||||||||
You can use `--topic <topic>` instead of `--all-topics` to specify a topic. | ||||||||||
Getting information on consumers | ||||||||||
"""""""""""""""""""""""""""""""" | ||||||||||
You can get informations on consumer groups:: | ||||||||||
~swh-environment/docker$ docker-compose \ | ||||||||||
run kafka kafka-consumer-groups.sh \ | ||||||||||
--bootstrap-server kafka:9092 \ | ||||||||||
--describe --members --all-groups | ||||||||||
[...] | ||||||||||
Or the stored offsets for all (or a given) groups:: | ||||||||||
~swh-environment/docker$ docker-compose \ | ||||||||||
run kafka kafka-consumer-groups.sh \ | ||||||||||
--bootstrap-server kafka:9092 \ | ||||||||||
--describe --offsets --all-groups | ||||||||||
[...] | ||||||||||
Using Sentry | Using Sentry | |||||||||
------------ | ------------ | |||||||||
All entrypoints to SWH code (CLI, gunicorn, celery, …) are, or should | All entrypoints to SWH code (CLI, gunicorn, celery, …) are, or should | |||||||||
be, instrumented using Sentry. By default this is disabled, but if you | be, instrumented using Sentry. By default this is disabled, but if you | |||||||||
run your own Sentry instance, you can use it. | run your own Sentry instance, you can use it. | |||||||||
To do so, you must get a DSN from your Sentry instance, and set it as | To do so, you must get a DSN from your Sentry instance, and set it as | |||||||||
Show All 27 Lines |
indent is off.