Page MenuHomeSoftware Heritage

README
No OneTemporary

The Software Heritage Git Loader is a tool and a library to walk a local
Git repository and inject into the SWH dataset all contained files that
weren't known before.
License
=======
This program is free software: you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation, either version 3 of the License, or (at your
option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
See top-level LICENSE file for the full text of the GNU General Public
License along with this program.
Dependencies
============
Runtime
-------
- python3
- python3-pygit2
- python3-swh.core
- python3-swh.storage
Test
----
- python3-nose
Requirements
============
- implementation language, Python3
- coding guidelines: conform to PEP8
- Git access: via libgit2/pygit
Configuration
=============
bin/swh-loader-git takes one argument: a configuration file in .ini format.
The configuration file contains the following directives:
```
[main]
# the storage class used. one of remote_storage, local_storage
storage_class = remote_storage
# arguments passed to the storage class
# for remote_storage: URI of the storage server
storage_args = http://localhost:5000/
# for local_storage: database connection string and root of the
# storage, comma separated
# storage_args = dbname=softwareheritage-dev, /tmp/swh/storage
# The path to the repository to load
repo_path = /tmp/git_repo
# The URL of the origin for the repo
origin_url = https://github.com/hylang/hy
# The ID of the authority that dated the validity of the repo
authority = 1
# The validity date of the refs in the given repo, in Postgres
# timestamptz format
validity = 2015-01-01 00:00:00+00
# Whether to send the given types of objects
send_contents = True
send_directories = True
send_revisions = True
send_releases = True
send_occurrences = True
# The size of the packets sent to storage for each kind of object
content_packet_size = 100000
directory_packet_size = 25000
revision_packet_size = 100000
release_packet_size = 100000
occurrence_packet_size = 100000
```
bin/swh-loader-git-multi takes the same arguments, and adds:
```
[main]
# database connection string to the lister-github database
lister_db = dbname=lister-github
# base path of the github repositories
repo_basepath = /srv/storage/space/data/github
# Whether to run the mass loading or just list the repos
dry_run = False
```

File Metadata

Mime Type
text/plain
Expires
Fri, Jul 4, 11:03 AM (3 w, 6 d ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3278439

Event Timeline