Page MenuHomeSoftware Heritage

README
No OneTemporary

The Software Heritage Git Loader is a tool and a library to walk a local
Git repository and inject into the SWH dataset all contained files that
weren't known before.
License
=======
This program is free software: you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation, either version 3 of the License, or (at your
option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
See top-level LICENSE file for the full text of the GNU General Public
License along with this program.
Dependencies
============
Runtime
-------
- python3
- python3-psycopg2
- python3-pygit2
Test
----
- python3-nose
Requirements
============
- implementation language, Python3
- coding guidelines: conform to PEP8
- Git access: via libgit2/pygit
- cache: implemented as Postgres tables
Configuration
=============
swh-git-loader depends on some tools, here are the configuration files
for those:
swh-db-manager
--------------
This is solely a tool in charge of db cleanup now.
Create a configuration file in **\~/.config/db-manager.ini**
``` {.ini}
[main]
# Where to store the logs
log_dir = swh-git-loader/log
# url access to db
db_url = dbname=swhgitloader
```
See <http://initd.org/psycopg/docs/module.html#psycopg2.connect> for the
db url's schema
swh-git-loader
--------------
Create a configuration file in **\~/.config/swh/git-loader.ini**:
``` {.ini}
[main]
# Where to store the logs
log_dir = /tmp/swh-git-loader/log
# how to access the backend (remote or local)
backend-type = remote
# backend-type remote: url access to api rest's backend
# backend-type local: configuration file to backend file .ini (cf. back.ini file)
backend = http://localhost:5000
```
Note:
- [DB url
DSL](http://initd.org/psycopg/docs/module.html#psycopg2.connect)
- the configuration file can be changed in the CLI with the flag \`-c
\<config-filepath\>\` or \`--config-file \<config-filepath\>\`
swh-backend
-----------
Backend api. This
Create a configuration file in **\~/.config/swh/back.ini**:
``` {.ini}
[main]
# where to store blob on disk
content_storage_dir = /tmp/swh-git-loader/content-storage
# Where to store the logs
log_dir = swh-git-loader/log
# url access to db: dbname=<host> (port=<port> user=<user> pass=<pass>)
db_url = dbname=swhgitloader
# activate the compression for each vcs stored object
# storage_compression = true
# compute folder's depth on disk aa/bb/cc/dd
# folder_depth = 2
# Debugger (for dev only)
debug = true
# server port to listen to requests
port = 6000
```
See <http://initd.org/psycopg/docs/module.html#psycopg2.connect> for the
db url's schema
Run
===
Environment initialization
--------------------------
``` {.bash}
export PYTHONPATH=`pwd`:$PYTHONPATH
```
Backend
-------
### With initialization
This depends on swh-sql repository, so:
``` {.bash}
cd /path/to/swh-sql && make clean initdb DBNAME=softwareheritage-dev
```
Using the Makefile eases:
``` {.bash}
make drop-db create-db run-back FOLLOW_LOG=-f
```
### without initialization
Running the backend.
``` {.bash}
./bin/swh-backend -v
```
With makefile:
``` {.bash}
make run-back FOLLOW_LOG=-f
```
Help
----
``` {.bash}
bin/swh-git-loader --help
bin/swh-db-manager --help
```
Parse a repository from a clean slate
-------------------------------------
Clean and initialize the model then parse the repository git:
``` {.bash}
bin/swh-db-manager cleandb
bin/swh-git-loader load /path/to/git/repo
```
For ease:
``` {.bash}
time make cleandb run REPO_PATH=~/work/inria/repo/swh-git-cloner
```
Parse an existing repository
----------------------------
``` {.bash}
bin/swh-git-loader load /path/to/git/repo
```
Clean data
----------
This will truncate the relevant table in the schema
``` {.bash}
bin/swh-db-manager cleandb
```
For ease:
``` {.bash}
make cleandb
```
Init data
---------
``` {.bash}
make drop-db create-db
```

File Metadata

Mime Type
text/plain
Expires
Wed, Jun 4, 7:19 PM (6 h, 56 m)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3282051

Event Timeline