Page MenuHomeSoftware Heritage

README
No OneTemporary

The Software Heritage Git Loader is a tool and a library to walk a local
Git repository and inject into the SWH dataset all contained files that
weren't known before.
License
=======
This program is free software: you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation, either version 3 of the License, or (at your
option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
See top-level LICENSE file for the full text of the GNU General Public
License along with this program.
Dependencies
============
Runtime
-------
- python3
- python3-psycopg2
- python3-pygit2
Test
----
- python3-nose
Requirements
============
- implementation language, Python3
- coding guidelines: conform to PEP8
- Git access: via libgit2/pygit
- cache: implemented as Postgres tables
Configuration
=============
swh-git-loader depends on some tools, here are the configuration files
for those:
swh-db-manager
--------------
This is solely a tool in charge of db cleanup now.
Create a configuration file in **\~/.config/db-manager.ini**
``` {.ini}
[main]
# Where to store the logs
log_dir = swh-git-loader/log
# url access to db
db_url = dbname=swhgitloader
```
See <http://initd.org/psycopg/docs/module.html#psycopg2.connect> for the
db url's schema
swh-git-loader
--------------
Create a configuration file in **\~/.config/swh/git-loader.ini**:
``` {.ini}
[main]
# Where to store the logs
log_dir = /tmp/swh-git-loader/log
# url access to api's backend
backend_url = http://localhost:5000
```
Note:
- [DB url
DSL](http://initd.org/psycopg/docs/module.html#psycopg2.connect)
- the configuration file can be changed in the CLI with the flag \`-c
\<config-filepath\>\` or \`--config-file \<config-filepath\>\`
swh-backend
-----------
Backend api. This
Create a configuration file in **\~/.config/swh/back.ini**:
``` {.ini}
[main]
# where to store blob on disk
content_storage_dir = /tmp/swh-git-loader/content-storage
# Where to store the logs
log_dir = swh-git-loader/log
# url access to db: dbname=<host> (port=<port> user=<user> pass=<pass>)
db_url = dbname=swhgitloader
# activate the compression for each vcs stored object
# storage_compression = true
# compute folder's depth on disk aa/bb/cc/dd
# folder_depth = 2
# Debugger (for dev only)
debug = true
# server port to listen to requests
port = 6000
```
See <http://initd.org/psycopg/docs/module.html#psycopg2.connect> for the
db url's schema
Run
===
Environment initialization
--------------------------
``` {.bash}
export PYTHONPATH=`pwd`:$PYTHONPATH
```
Help
----
``` {.bash}
bin/swh-git-loader --help
bin/swh-db-manager --help
```
Parse a repository from a clean slate
-------------------------------------
Clean and initialize the model then parse the repository git:
``` {.bash}
bin/swh-db-manager cleandb
bin/swh-git-loader load /path/to/git/repo
```
For ease:
``` {.bash}
make cleandb clean-and-run REPO_PATH=/path/to/git/repo
```
Parse an existing repository
----------------------------
``` {.bash}
bin/swh-git-loader load /path/to/git/repo
```
Clean data
----------
``` {.bash}
bin/swh-db-manager cleandb
```
For ease:
``` {.bash}
make cleandb
```
Init data
---------
``` {.bash}
make drop-db create-db
```

File Metadata

Mime Type
text/plain
Expires
Fri, Jul 4, 2:23 PM (3 d, 21 h ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3285423

Event Timeline