Page Menu
Home
Software Heritage
Search
Configure Global Search
Log In
Files
F9340735
README
No One
Temporary
Actions
View File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Flag For Later
Size
2 KB
Subscribers
None
README
View Options
The Software Heritage Git Loader is a tool and a library to walk a local
Git repository and inject into the SWH dataset all contained files that
weren't known before.
License
=======
This program is free software: you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation, either version 3 of the License, or (at your
option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
See top-level LICENSE file for the full text of the GNU General Public
License along with this program.
Dependencies
============
Runtime
-------
- python3
- python3-pygit2
- python3-swh.core
- python3-swh.storage
Test
----
- python3-nose
Requirements
============
- implementation language, Python3
- coding guidelines: conform to PEP8
- Git access: via libgit2/pygit
Configuration
=============
bin/swh-loader-git takes one argument: a configuration file in .ini format.
The configuration file contains the following directives:
```
[main]
# the storage class used. one of remote_storage, local_storage
storage_class = remote_storage
# arguments passed to the storage class
# for remote_storage: URI of the storage server
storage_args = http://localhost:5000/
# for local_storage: database connection string and root of the
# storage, comma separated
# storage_args = dbname=softwareheritage-dev, /tmp/swh/storage
# The path to the repository to load
repo_path = /tmp/git_repo
# The URL of the origin for the repo
origin_url = https://github.com/hylang/hy
# The ID of the authority that dated the validity of the repo
authority = 1
# The validity date of the refs in the given repo, in Postgres
# timestamptz format
validity = 2015-01-01 00:00:00+00
# Whether to send the given types of objects
send_contents = True
send_directories = True
send_revisions = True
send_releases = True
send_occurrences = True
# The size of the packets sent to storage for each kind of object
content_packet_size = 100000
directory_packet_size = 25000
revision_packet_size = 100000
release_packet_size = 100000
occurrence_packet_size = 100000
```
bin/swh-loader-git-multi takes the same arguments, and adds:
```
[main]
# database connection string to the lister-github database
lister_db = dbname=lister-github
# base path of the github repositories
repo_basepath = /srv/storage/space/data/github
# Whether to run the mass loading or just list the repos
dry_run = False
```
File Metadata
Details
Attached
Mime Type
text/plain
Expires
Fri, Jul 4, 11:03 AM (3 w, 6 d ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3278439
Attached To
rDLDG Git loader
Event Timeline
Log In to Comment