The Software Heritage Git Loader is a tool and a library to walk a local Git repository and inject into the SWH dataset all contained files that weren't known before. License ======= This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. See top-level LICENSE file for the full text of the GNU General Public License along with this program. Dependencies ============ Runtime ------- - python3 - python3-pygit2 - python3-swh.core - python3-swh.storage Test ---- - python3-nose Requirements ============ - implementation language, Python3 - coding guidelines: conform to PEP8 - Git access: via libgit2/pygit Configuration ============= bin/swh-loader-git takes one argument: a configuration file in .ini format. The configuration file contains the following directives: ``` [main] # the storage class used. one of remote_storage, local_storage storage_class = remote_storage # arguments passed to the storage class # for remote_storage: URI of the storage server storage_args = http://localhost:5000/ # for local_storage: database connection string and root of the # storage, comma separated # storage_args = dbname=softwareheritage-dev, /tmp/swh/storage # The path to the repository to load repo_path = /tmp/git_repo # The URL of the origin for the repo origin_url = https://github.com/hylang/hy # The ID of the authority that dated the validity of the repo authority = 1 # The validity date of the refs in the given repo, in Postgres # timestamptz format validity = 2015-01-01 00:00:00+00 # Whether to send the given types of objects send_contents = True send_directories = True send_revisions = True send_releases = True send_occurrences = True # The size of the packets sent to storage for each kind of object content_packet_size = 100000 directory_packet_size = 25000 revision_packet_size = 100000 release_packet_size = 100000 occurrence_packet_size = 100000 ``` bin/swh-loader-git-multi takes the same arguments, and adds: ``` [main] # database connection string to the lister-github database lister_db = dbname=lister-github # base path of the github repositories repo_basepath = /srv/storage/space/data/github # Whether to run the mass loading or just list the repos dry_run = False ```