Changeset View
Changeset View
Standalone View
Standalone View
docs/docker.rst
Graph Docker environment | Graph Docker environment | ||||
======================== | ======================== | ||||
Build | Build | ||||
----- | ----- | ||||
.. code:: bash | .. code:: bash | ||||
$ git clone https://forge.softwareheritage.org/source/swh-graph.git | $ git clone https://forge.softwareheritage.org/source/swh-graph.git | ||||
$ cd swh-graph | $ cd swh-graph | ||||
$ docker build --tag swh-graph dockerfiles | $ docker build --tag swh-graph dockerfiles | ||||
Run | Run | ||||
--- | --- | ||||
Given a graph specified by: | Given a graph ``g`` specified by: | ||||
- ``g.edges.csv.gz``: gzip-compressed csv file with one edge per line, as a | - ``g.edges.csv.gz``: gzip-compressed csv file with one edge per line, as a | ||||
"SRC_ID SPACE DST_ID" string, where identifiers are the `persistent identifier | "SRC_ID SPACE DST_ID" string, where identifiers are the | ||||
<https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html#persistent-identifiers>`_ | :ref:`persistent-identifiers` of each node. | ||||
of each node. | |||||
- ``g.nodes.csv.gz``: sorted list of unique node identifiers appearing in the | - ``g.nodes.csv.gz``: sorted list of unique node identifiers appearing in the | ||||
corresponding ``g.edges.csv.gz`` file. The format is a gzip-compressed csv | corresponding ``g.edges.csv.gz`` file. The format is a gzip-compressed csv | ||||
file with one persistent identifier per line. | file with one persistent identifier per line. | ||||
.. code:: bash | .. code:: bash | ||||
$ docker run \ | $ docker run -ti \ | ||||
--volume /path/to/graph/:/graph \ | --volume /PATH/TO/GRAPH/:/srv/softwareheritage/graph/data \ | ||||
--volume /path/to/output/:/graph/compressed \ | --publish 127.0.0.1:5009:5009 \ | ||||
--name swh-graph --tty --interactive \ | swh-graph:latest \ | ||||
swh-graph:latest bash | bash | ||||
Where ``/PATH/TO/GRAPH`` is a directory containing the ``g.edges.csv.gz`` and | |||||
``g.nodes.csv.gz`` files. By default, when entering the container the current | |||||
working directory will be ``/srv/softwareheritage/graph``; all relative paths | |||||
found below are intended to be relative to that dir. | |||||
Where ``/path/to/graph`` is a directory containing the ``g.edges.csv.gz`` and | |||||
``g.nodes.csv.gz`` files. | |||||
Graph compression | Graph compression | ||||
~~~~~~~~~~~~~~~~~ | ~~~~~~~~~~~~~~~~~ | ||||
To start graph compression: | To compress the graph: | ||||
.. code:: bash | .. code:: bash | ||||
$ ./scripts/compress_graph.sh \ | $ app/scripts/compress_graph.sh --lib lib/ --input data/g | ||||
--input /graph/g \ | |||||
--output /graph/compressed \ | |||||
--lib /swh/graph-lib \ | |||||
--tmp /graph/compressed/tmp \ | |||||
--stdout /graph/compressed/stdout \ | |||||
--stderr /graph/compressed/stderr | |||||
Warning: very large graphs may need a bigger batch size parameter for WebGraph | Warning: very large graphs may need a bigger batch size parameter for WebGraph | ||||
internals (you can specify a value when running the compression script using: | internals (you can specify a value when running the compression script using: | ||||
``--batch-size 1000000000``). | ``--batch-size 1000000000``). | ||||
Node ids mapping | |||||
~~~~~~~~~~~~~~~~ | |||||
To dump the mapping files: | Node identifier mappings | ||||
~~~~~~~~~~~~~~~~~~~~~~~~ | |||||
.. code:: bash | To dump the mapping files (i.e., various node id <-> other info mapping files, | ||||
in either ``.csv.gz`` or ad-hoc ``.map`` format): | |||||
$ java -cp /swh/app/swh-graph.jar \ | .. code:: bash | ||||
org.softwareheritage.graph.backend.Setup /graph/compressed/g | |||||
This command outputs: | $ java -cp app/swh-graph.jar \ | ||||
org.softwareheritage.graph.backend.Setup data/compressed/g | |||||
- ``g.node2pid.csv``: long node id to string persistent identifier. | |||||
- ``g.pid2node.csv``: string persistent identifier to long node id. | |||||
REST API | Graph server | ||||
~~~~~~~~ | ~~~~~~~~~~~~ | ||||
To start the REST API web-service: | To start the swh-graph server: | ||||
.. code:: bash | .. code:: bash | ||||
$ java -cp /swh/app/swh-graph.jar \ | $ java -cp app/swh-graph.jar \ | ||||
org.softwareheritage.graph.App /graph/compressed/g | org.softwareheritage.graph.App data/compressed/g |