Page Menu
Home
Software Heritage
Search
Configure Global Search
Log In
Files
F9540143
No One
Temporary
Actions
Download File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Flag For Later
Size
5 KB
Subscribers
None
View Options
diff --git a/README b/README
deleted file mode 100644
index 90b283b..0000000
--- a/README
+++ /dev/null
@@ -1,96 +0,0 @@
-SWH-loader-dir
-==============
-
-The Software Heritage Directory Loader is a tool and a library to walk a local
-directory and inject into the SWH dataset all unknown contained files.
-
-
-Directory loader
-================
-
-
-### Configuration
-
-This is the loader's (or task's) configuration file.
-
-loader/dir.yml:
-
- storage:
- cls: remote
- args:
- url: http://localhost:5002/
-
- send_contents: True
- send_directories: True
- send_revisions: True
- send_releases: True
- send_occurrences: True
- # nb of max contents to send for storage
- content_packet_size: 100
- # 100 Mib of content data
- content_packet_block_size_bytes: 104857600
- # limit for swh content storage for one blob (beyond that limit, the
- # content's data is not sent for storage)
- content_packet_size_bytes: 1073741824
- directory_packet_size: 250
- revision_packet_size: 100
- release_packet_size: 100
- occurrence_packet_size: 100
-
-Present in possible locations:
-- ~/.config/swh/loader/dir.ini
-- ~/.swh/loader/dir.ini
-- /etc/softwareheritage/loader/dir.ini
-
-
-#### Toplevel
-
-Load directory directly from code or toplevel:
-
- from swh.loader.dir.loader import DirLoader
-
- dir_path = '/path/to/directory
-
- # Fill in those
- origin = {'url': 'some-origin', 'type': 'dir'}
- visit_date = 'Tue, 3 May 2017 17:16:32 +0200'
- release = None
- revision = {}
- occurrence = {}
-
- DirLoader().load(dir_path, origin, visit_date, revision, release, [occurrence])
-
-
-#### Celery
-
-Load directory using celery.
-
-Providing you have a properly configured celery up and running
-
-worker.ini needs to be updated with the following keys:
-
- task_modules = swh.loader.dir.tasks
- task_queues = swh_loader_dir
-
-cf. https://forge.softwareheritage.org/diffusion/DCORE/browse/master/README.md
-for more details
-
-You can send the following message to the task queue:
-
- from swh.loader.dir.tasks import LoadDirRepository
-
- # Fill in those
- origin = {'url': 'some-origin', 'type': 'dir'}
- visit_date = 'Tue, 3 May 2017 17:16:32 +0200'
- release = None
- revision = {}
- occurrence = {}
-
- # Send message to the task queue
- LoaderDirRepository().run(('/path/to/dir, origin, visit_date, revision, release, [occurrence]))
-
-
-Directory producer
-==================
-
-None
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..0047603
--- /dev/null
+++ b/README.md
@@ -0,0 +1,96 @@
+SWH-loader-dir
+===================
+
+The Software Heritage Directory Loader is a tool and a library.
+
+Its sole purpose is to walk a local directory and inject into the SWH
+dataset all unknown contained files from that directory structure.
+
+
+## Configuration
+
+The loader needs a configuration file in *`{/etc/softwareheritage |
+~/.config/swh | ~/.swh}`/loader/dir.yml*.
+
+This file should be similar to this (adapt according to your needs):
+
+``` yaml
+storage:
+ cls: remote
+ args:
+ url: http://localhost:5002/
+
+send_contents: True
+send_directories: True
+send_revisions: True
+send_releases: True
+send_occurrences: True
+# nb of max contents to send for storage
+content_packet_size: 100
+# 100 Mib of content data
+content_packet_block_size_bytes: 104857600
+# limit for swh content storage for one blob (beyond that limit, the
+# content's data is not sent for storage)
+content_packet_size_bytes: 1073741824
+directory_packet_size: 250
+revision_packet_size: 100
+release_packet_size: 100
+occurrence_packet_size: 100
+```
+
+## Run
+
+To run the loader, you can use either:
+
+- python3's toplevel
+- celery
+
+### Toplevel
+
+Load directory directly from code or toplevel:
+
+``` Python
+from swh.loader.dir.loader import DirLoader
+
+dir_path = '/path/to/directory
+
+# Fill in those
+origin = {'url': 'some-origin', 'type': 'dir'}
+visit_date = 'Tue, 3 May 2017 17:16:32 +0200'
+release = None
+revision = {}
+occurrence = {}
+
+DirLoader().load(dir_path, origin, visit_date, revision, release, [occurrence])
+```
+
+### Celery
+
+To use celery, add the following entries in the
+*`{/etc/softwareheritage | ~/.config/swh | ~/.swh}`/worker.yml*` file:
+
+``` yaml
+task_modules:
+ - swh.loader.dir.tasks
+task_queues:
+ - swh_loader_dir
+```
+
+cf. [swh-core's documentation](https://forge.softwareheritage.org/diffusion/DCORE/browse/master/README.md) for
+more details.
+
+You can then send the following message to the task queue:
+
+``` Python
+from swh.loader.dir.tasks import LoadDirRepository
+
+# Fill in those
+origin = {'url': 'some-origin', 'type': 'dir'}
+visit_date = 'Tue, 3 May 2017 17:16:32 +0200'
+release = None
+revision = {}
+occurrence = {}
+
+# Send message to the task queue
+LoaderDirRepository().run(('/path/to/dir', origin, visit_date, revision, release, [occurrence]))
+```
diff --git a/docs/.gitignore b/docs/.gitignore
index 58a761e..f6b5c55 100644
--- a/docs/.gitignore
+++ b/docs/.gitignore
@@ -1,3 +1,4 @@
_build/
apidoc/
*-stamp
+README.md
diff --git a/docs/Makefile b/docs/Makefile
index c30c50a..ec260d2 100644
--- a/docs/Makefile
+++ b/docs/Makefile
@@ -1 +1,6 @@
include ../../swh-docs/Makefile.sphinx
+
+html: copy_md
+
+copy_md:
+ cp ../README.md README.md
diff --git a/docs/index.rst b/docs/index.rst
index 8b64117..2e88ed8 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -1,15 +1,15 @@
Software Heritage - Development Documentation
=============================================
.. toctree::
:maxdepth: 2
:caption: Contents:
-
+ README.md
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
File Metadata
Details
Attached
Mime Type
application/octet-stream
Expires
Mon, Jul 21, 11:36 PM (2 d)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3320794
Attached To
rDLDDIR Directory Loader
Event Timeline
Log In to Comment