Page Menu
Home
Software Heritage
Search
Configure Global Search
Log In
Files
F9078509
No One
Temporary
Actions
Download File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Flag For Later
Size
2 KB
Subscribers
None
View Options
diff --git a/README.md b/README.md
index 0047603..b20a34c 100644
--- a/README.md
+++ b/README.md
@@ -1,96 +1,88 @@
SWH-loader-dir
===================
The Software Heritage Directory Loader is a tool and a library.
Its sole purpose is to walk a local directory and inject into the SWH
dataset all unknown contained files from that directory structure.
## Configuration
The loader needs a configuration file in *`{/etc/softwareheritage |
~/.config/swh | ~/.swh}`/loader/dir.yml*.
This file should be similar to this (adapt according to your needs):
``` yaml
storage:
cls: remote
args:
url: http://localhost:5002/
-
-send_contents: True
-send_directories: True
-send_revisions: True
-send_releases: True
-send_occurrences: True
-# nb of max contents to send for storage
-content_packet_size: 100
-# 100 Mib of content data
-content_packet_block_size_bytes: 104857600
-# limit for swh content storage for one blob (beyond that limit, the
-# content's data is not sent for storage)
-content_packet_size_bytes: 1073741824
-directory_packet_size: 250
-revision_packet_size: 100
-release_packet_size: 100
-occurrence_packet_size: 100
```
## Run
To run the loader, you can use either:
- python3's toplevel
- celery
### Toplevel
Load directory directly from code or toplevel:
``` Python
-from swh.loader.dir.loader import DirLoader
-
-dir_path = '/path/to/directory
+dir_path = '/home/storage/dir/'
# Fill in those
origin = {'url': 'some-origin', 'type': 'dir'}
visit_date = 'Tue, 3 May 2017 17:16:32 +0200'
-release = None
-revision = {}
-occurrence = {}
+revision = {
+ 'author': {'name': 'some', 'fullname': 'one', 'email': 'something'},
+ 'committer': {'name': 'some', 'fullname': 'one', 'email': 'something'},
+ 'message': '1.0 Released',
+ 'date': None,
+ 'committer_date': None,
+ 'type': 'tar',
+ 'metadata': {}
+}
+import logging
+logging.basicConfig(level=logging.DEBUG)
-DirLoader().load(dir_path, origin, visit_date, revision, release, [occurrence])
+from swh.loader.dir.tasks import LoadDirRepository
+l = LoadDirRepository()
+l.run_task(dir_path=dir_path, origin=origin, visit_date=visit_date,
+ revision=revision, release=None, branch_name='master')
```
### Celery
To use celery, add the following entries in the
*`{/etc/softwareheritage | ~/.config/swh | ~/.swh}`/worker.yml*` file:
``` yaml
task_modules:
- swh.loader.dir.tasks
task_queues:
- swh_loader_dir
```
cf. [swh-core's documentation](https://forge.softwareheritage.org/diffusion/DCORE/browse/master/README.md) for
more details.
You can then send the following message to the task queue:
``` Python
from swh.loader.dir.tasks import LoadDirRepository
# Fill in those
origin = {'url': 'some-origin', 'type': 'dir'}
visit_date = 'Tue, 3 May 2017 17:16:32 +0200'
release = None
revision = {}
occurrence = {}
# Send message to the task queue
LoaderDirRepository().run(('/path/to/dir', origin, visit_date, revision, release, [occurrence]))
```
File Metadata
Details
Attached
Mime Type
application/octet-stream
Expires
Thu, Jun 19, 2:20 AM (1 d, 23 h)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3441169
Attached To
rDLDDIR Directory Loader
Event Timeline
Log In to Comment