Page MenuHomeSoftware Heritage

README
No OneTemporary

The Software Heritage Git Loader is a tool and a library to walk a local
Git repository and inject into the SWH dataset all contained files that
weren't known before.
License
=======
This program is free software: you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation, either version 3 of the License, or (at your
option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
See top-level LICENSE file for the full text of the GNU General Public
License along with this program.
Dependencies
============
Runtime
-------
- python3
- python3-dulwich
- python3-retrying
- python3-swh.core
- python3-swh.model
- python3-swh.storage
- python3-swh.scheduler
Test
----
- python3-nose
Requirements
============
- implementation language, Python3
- coding guidelines: conform to PEP8
- Git access: via dulwich
Configuration
=============
You can run the loader or the updater directly by calling python3 -m swh.loader.git.{loader,updater}.
Both tools expect a configuration file in .ini format to be present in ~/.config/swh/loader/git-{loader,updater}.ini
The configuration file contains the following directives:
```
[main]
# the storage class used. one of remote_storage, local_storage
storage_class = remote_storage
# arguments passed to the storage class
# for remote_storage: URI of the storage server
storage_args = http://localhost:5002/
# for local_storage: database connection string and root of the
# storage, comma separated
# storage_args = dbname=softwareheritage-dev, /tmp/swh/storage
# Whether to send the given types of objects
send_contents = True
send_directories = True
send_revisions = True
send_releases = True
send_occurrences = True
# The size of the packets sent to storage for each kind of object
content_packet_size = 100000
content_packet_size_bytes = 1073741824
directory_packet_size = 25000
revision_packet_size = 100000
release_packet_size = 100000
occurrence_packet_size = 100000
```

File Metadata

Mime Type
text/plain
Expires
Jul 4 2025, 6:32 PM (5 w, 2 d ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3276301

Event Timeline