diff --git a/README b/README index f246fb7..9098a3c 100644 --- a/README +++ b/README @@ -1,82 +1,83 @@ The Software Heritage Git Loader is a tool and a library to walk a local Git repository and inject into the SWH dataset all contained files that weren't known before. License ======= This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. See top-level LICENSE file for the full text of the GNU General Public License along with this program. Dependencies ============ Runtime ------- - python3 - python3-dulwich - python3-retrying - python3-swh.core - python3-swh.model - python3-swh.storage +- python3-swh.scheduler Test ---- - python3-nose Requirements ============ - implementation language, Python3 - coding guidelines: conform to PEP8 - Git access: via dulwich Configuration ============= You can run the loader or the updater directly by calling python3 -m swh.loader.git.{loader,updater}. Both tools expect a configuration file in .ini format to be present in ~/.config/swh/loader/git-{loader,updater}.ini The configuration file contains the following directives: ``` [main] # the storage class used. one of remote_storage, local_storage storage_class = remote_storage # arguments passed to the storage class # for remote_storage: URI of the storage server storage_args = http://localhost:5000/ # for local_storage: database connection string and root of the # storage, comma separated # storage_args = dbname=softwareheritage-dev, /tmp/swh/storage # Whether to send the given types of objects send_contents = True send_directories = True send_revisions = True send_releases = True send_occurrences = True # The size of the packets sent to storage for each kind of object content_packet_size = 100000 content_packet_size_bytes = 1073741824 directory_packet_size = 25000 revision_packet_size = 100000 release_packet_size = 100000 occurrence_packet_size = 100000 ```