diff --git a/README b/README index c89aa32..90b283b 100644 --- a/README +++ b/README @@ -1,98 +1,96 @@ SWH-loader-dir ============== The Software Heritage Directory Loader is a tool and a library to walk a local directory and inject into the SWH dataset all unknown contained files. Directory loader ================ ### Configuration This is the loader's (or task's) configuration file. loader/dir.yml: storage: cls: remote args: url: http://localhost:5002/ send_contents: True send_directories: True send_revisions: True send_releases: True send_occurrences: True # nb of max contents to send for storage content_packet_size: 100 # 100 Mib of content data content_packet_block_size_bytes: 104857600 # limit for swh content storage for one blob (beyond that limit, the # content's data is not sent for storage) content_packet_size_bytes: 1073741824 directory_packet_size: 250 revision_packet_size: 100 release_packet_size: 100 occurrence_packet_size: 100 Present in possible locations: - ~/.config/swh/loader/dir.ini - ~/.swh/loader/dir.ini - /etc/softwareheritage/loader/dir.ini #### Toplevel Load directory directly from code or toplevel: - from swh.loader.dir.tasks import LoadDirRepository + from swh.loader.dir.loader import DirLoader dir_path = '/path/to/directory # Fill in those - origin = {} + origin = {'url': 'some-origin', 'type': 'dir'} + visit_date = 'Tue, 3 May 2017 17:16:32 +0200' release = None revision = {} occurrence = {} - LoaderDir().load(dir_path, origin, revision, release, [occurrence]) + DirLoader().load(dir_path, origin, visit_date, revision, release, [occurrence]) #### Celery Load directory using celery. Providing you have a properly configured celery up and running worker.ini needs to be updated with the following keys: task_modules = swh.loader.dir.tasks task_queues = swh_loader_dir cf. https://forge.softwareheritage.org/diffusion/DCORE/browse/master/README.md for more details You can send the following message to the task queue: from swh.loader.dir.tasks import LoadDirRepository # Fill in those - origin = {} + origin = {'url': 'some-origin', 'type': 'dir'} + visit_date = 'Tue, 3 May 2017 17:16:32 +0200' release = None revision = {} occurrence = {} # Send message to the task queue - LoaderDir().load(('/path/to/dir, - origin, - revision, - release, - [occurrence])) + LoaderDirRepository().run(('/path/to/dir, origin, visit_date, revision, release, [occurrence])) Directory producer ================== None