diff --git a/docs/index.rst b/docs/index.rst --- a/docs/index.rst +++ b/docs/index.rst @@ -19,6 +19,54 @@ graph (:mod:`swh.storage.storage`). +Using ``swh-storage`` +--------------------- + +First, note that ``swh-storage`` is an internal API of Software Heritage, that +is only available to software running on the SWH infrastructure and developers +:ref:`running their own Software Heritage `. +If you want to access the Software Heritage archive without running your own, +you should use the `Web API`_ instead. + +As ``swh-storage`` has multiple backends, it is instantiated via the +:py:func:`swh.storage.get_storage` function, which takes as argument the backend type +(usually ``remote``, if you already have access to a running swh-storage). + +It returns an instance of a class implementing +:py:class:`swh.storage.interface.StorageInterface`; which is mostly a set of key-value +stores, one for each object type. + +Many of the arguments and return types are "model objects", ie. immutable objects +that are instances of the classes defined in :py:mod:`swh.model.model`. + +Methods returning long lists of arguments are paginated; by returning both a list +of results and an opaque token to get the next page of results. +For example, to list all the visits of an origin using ``origin_visit_get`` +ten visits at a time, you can do: + +.. code-block:: + + storage = get_storage("remote", url="http://localhost:5002") + while True: + page = storage.origin_visit_get(origin="https://github.com/torvalds/linux") + for visit in page.results: + print(visit) + if page.next_page_token is None: + break + +Or, using :py:func:`swh.core.api.classes.stream_results` for convenience: + +.. code-block:: + + storage = get_storage("remote", url="http://localhost:5002") + visits = stream_results( + storage.origin_visit_get, origin="https://github.com/torvalds/linux" + ) + for visit in visits: + print(visit) + +.. _Web API: https://archive.softwareheritage.org/api/ + Database schema --------------- diff --git a/swh/storage/__init__.py b/swh/storage/__init__.py --- a/swh/storage/__init__.py +++ b/swh/storage/__init__.py @@ -28,10 +28,14 @@ `storage_args`. Args: - storage (dict): dictionary with keys: - - cls (str): storage's class, either local, remote, memory, filter, - buffer - - args (dict): dictionary with keys + cls (str): storage's class, can be: + - ``local`` to use a postgresql database + - ``cassandra`` to use a cassandra database + - ``remote`` to connect to a swh-storage server + - ``memory`` for an in-memory storage, useful for fast tests + - ``filter``, ``buffer``, ... to use specific storage "proxies", see their + respective documentations + args (dict): dictionary with keys Returns: an instance of swh.storage.Storage or compatible class