Page Menu
Home
Software Heritage
Search
Configure Global Search
Log In
Files
F9340811
No One
Temporary
Actions
View File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Flag For Later
Size
2 KB
Subscribers
None
View Options
diff --git a/docs/index.rst b/docs/index.rst
index cd5c2eb..b430675 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -1,21 +1,97 @@
.. _swh-objstorage-replayer:
Software Heritage - Object storage replayer
===========================================
This Python module provides a command line tool to replicate content objects from a
-source Object storage to a destination one by listening the `content` topic of a
-`swh.journal` kafka stream.
+source Object storage to a destination one by listening the ``content`` topic of a
+:ref:`swh-journal` kafka stream.
It is meant to be used as the brick of a mirror setup dedicated to replicating content
objects.
+Quick start
+-----------
+
+Once installed (using pip or debian packages), the command ``swh objstorage
+replay`` should be available:
+
+It needs a configuration file with 4 sections:
+
+- ``objstorage``: the source objstorage to retrieve objects from,
+
+- ``objstorage_dst``: the destination objstorage to put objects into,
+
+- ``journal_client``: the journal client (kafka configuration where the object
+ hashes are consumed from),
+
+- ``replayer`` (optional): some replayer specific configurations options.
+
+
+For example with a configuration file like:
+
+.. code-block:: yaml
+
+ objstorage:
+ cls: multiplexer
+ objstorages:
+ - cls: http
+ url: https://softwareheritage.s3.amazonaws.com/content/
+ compression: gzip
+ - cls: remote
+ url: https://login:password@objstorage.staging.swh.network
+
+ objstorage_dst:
+ cls: remote
+ args:
+ url: http://objstorage:5003
+
+ journal_client:
+ cls: kafka
+ brokers:
+ - broker1.journal.staging.swh.network:9093
+ group_id: kafka-username-content-replayer-003
+ sasl.username: kafka-username
+ sasl.password: kafka-password
+ security.protocol: sasl_ssl
+ sasl.mechanism: SCRAM-SHA-512
+ session.timeout.ms: 600000
+ max.poll.interval.ms: 3600000
+ message.max.bytes: 1000000000
+ privileged: true
+ batch_size: 2000
+
+ replayer:
+ error_reporter:
+ host: redis
+ port: 6379
+ db: 0
+
+
+you can start the content replayer with:
+
+.. code-block:: bash
+
+ $ swh objstorage -C replayer-config.yml replay
+
+
+You would typically run this tool on several machines, using the same
+``group_id``, to increase replication parallelism.
+
+Also note that you may increase the default concurrency within one replayer
+using the ``--concurrency`` command line option. This will use as many
+replication threads as given in argument, distributing the replication of
+objects **within the same kafka consumer** among these threads. This is
+typically useful when the replication of one object comes with non negligeable
+minimal latency (e.g. consuming from public cloud-based objstorages).
+
+
Reference Documentation
-----------------------
.. toctree::
:maxdepth: 2
cli
/apidoc/swh.objstorage.replayer
File Metadata
Details
Attached
Mime Type
text/x-diff
Expires
Fri, Jul 4, 11:12 AM (3 w, 5 d ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3251010
Attached To
rDOBJSRPL Content replayer
Event Timeline
Log In to Comment