Page Menu
Home
Software Heritage
Search
Configure Global Search
Log In
Files
F9347204
index.rst
No One
Temporary
Actions
View File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Flag For Later
Size
2 KB
Subscribers
None
index.rst
View Options
..
_swh-storage:
Software Heritage - Storage
===========================
Abstraction layer over the archive, allowing to access all stored source code
artifacts as well as their metadata
The Software Heritage storage consist of a high-level storage layer
(
:mod:
`swh.storage`
) that exposes a client/server API
(
:mod:
`swh.storage.api`
). The API is exposed by a server
(
:mod:
`swh.storage.api.server`
) and accessible via a client
(
:mod:
`swh.storage.api.client`
).
The low-level implementation of the storage is split between an object storage
(
:ref:
`swh.objstorage <swh-objstorage>`
), which stores all "blobs" (i.e., the
leaves of the
:ref:
`data-model`
) and a SQL representation of the rest of the
graph (
:mod:
`swh.storage.storage`
).
Using ``swh-storage``
---------------------
First, note that
``swh-storage``
is an internal API of Software Heritage, that
is only available to software running on the SWH infrastructure and developers
:ref:
`running their own Software Heritage <getting-started>`
.
If you want to access the Software Heritage archive without running your own,
you should use the :swh_web:
`Web API <api/>`
instead.
As
``swh-storage``
has multiple backends, it is instantiated via the
:py:func:
`swh.storage.get_storage`
function, which takes as argument the backend type
(usually
``remote``
, if you already have access to a running swh-storage).
It returns an instance of a class implementing
:py:class:
`swh.storage.interface.StorageInterface`
; which is mostly a set of key-value
stores, one for each object type.
Many of the arguments and return types are "model objects", ie. immutable objects
that are instances of the classes defined in
:py:mod:
`swh.model.model`
.
Methods returning long lists of arguments are paginated; by returning both a list
of results and an opaque token to get the next page of results.
For example, to list all the visits of an origin using
``origin_visit_get``
ten visits at a time, you can do:
..
code-block
::
storage = get_storage("remote", url="http://localhost:5002")
while True:
page = storage.origin_visit_get(origin="https://github.com/torvalds/linux")
for visit in page.results:
print(visit)
if page.next_page_token is None:
break
Or, using
:py:func:
`swh.core.api.classes.stream_results`
for convenience:
..
code-block
::
storage = get_storage("remote", url="http://localhost:5002")
visits = stream_results(
storage.origin_visit_get, origin="https://github.com/torvalds/linux"
)
for visit in visits:
print(visit)
Database schema
---------------
*
:ref:
`sql-storage`
Archive copies
--------------
*
:ref:
`archive-copies`
Specifications
--------------
*
:ref:
`extrinsic-metadata-specification`
Reference Documentation
-----------------------
..
toctree
::
:maxdepth:
2
cli
/apidoc/swh.storage
File Metadata
Details
Attached
Mime Type
text/plain
Expires
Fri, Jul 4, 5:07 PM (3 w, 2 d ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3283981
Attached To
rDSTO Storage manager
Event Timeline
Log In to Comment