Page Menu
Home
Software Heritage
Search
Configure Global Search
Log In
Files
F8393941
README
No One
Temporary
Actions
View File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Flag For Later
Size
4 KB
Subscribers
None
README
View Options
The Software Heritage Git Loader is a tool and a library to walk a local
Git repository and inject into the SWH dataset all contained files that
weren't known before.
License
=======
This program is free software: you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation, either version 3 of the License, or (at your
option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
See top-level LICENSE file for the full text of the GNU General Public
License along with this program.
Dependencies
============
Runtime
-------
- python3
- python3-psycopg2
- python3-pygit2
Test
----
- python3-nose
Requirements
============
- implementation language, Python3
- coding guidelines: conform to PEP8
- Git access: via libgit2/pygit
- cache: implemented as Postgres tables
Configuration
=============
swh-git-loader depends on some tools, here are the configuration files
for those:
swh-db-manager
--------------
This is solely a tool in charge of db cleanup now.
Create a configuration file in **\~/.config/db-manager.ini**
``` {.ini}
[main]
# Where to store the logs
log_dir = swh-git-loader/log
# url access to db
db_url = dbname=swhgitloader
```
See <http://initd.org/psycopg/docs/module.html#psycopg2.connect> for the
db url's schema
swh-git-loader
--------------
Create a configuration file in **\~/.config/swh/git-loader.ini**:
``` {.ini}
[main]
# Where to store the logs
log_dir = /tmp/swh-git-loader/log
# how to access the backend (remote or local)
backend-type = remote
# backend-type remote: url access to api rest's backend
# backend-type local: configuration file to backend file .ini (cf. back.ini file)
backend = http://localhost:5000
```
Note:
- [DB url
DSL](http://initd.org/psycopg/docs/module.html#psycopg2.connect)
- the configuration file can be changed in the CLI with the flag \`-c
\<config-filepath\>\` or \`--config-file \<config-filepath\>\`
swh-backend
-----------
Backend api. This
Create a configuration file in **\~/.config/swh/back.ini**:
``` {.ini}
[main]
# where to store blob on disk
content_storage_dir = /tmp/swh-git-loader/content-storage
# Where to store the logs
log_dir = swh-git-loader/log
# url access to db: dbname=<host> (port=<port> user=<user> pass=<pass>)
db_url = dbname=swhgitloader
# activate the compression for each vcs stored object
# storage_compression = true
# compute folder's depth on disk aa/bb/cc/dd
# folder_depth = 2
# Debugger (for dev only)
debug = true
# server port to listen to requests
port = 6000
```
See <http://initd.org/psycopg/docs/module.html#psycopg2.connect> for the
db url's schema
Run
===
Environment initialization
--------------------------
``` {.bash}
export PYTHONPATH=`pwd`:$PYTHONPATH
```
Backend
-------
### With initialization
This depends on swh-sql repository, so:
``` {.bash}
cd /path/to/swh-sql && make clean initdb DBNAME=softwareheritage-dev
```
Using the Makefile eases:
``` {.bash}
make drop-db create-db run-back FOLLOW_LOG=-f
```
### without initialization
Running the backend.
``` {.bash}
./bin/swh-backend -v
```
With makefile:
``` {.bash}
make run-back FOLLOW_LOG=-f
```
Help
----
``` {.bash}
bin/swh-git-loader --help
bin/swh-db-manager --help
```
Parse a repository from a clean slate
-------------------------------------
Clean and initialize the model then parse the repository git:
``` {.bash}
bin/swh-db-manager cleandb
bin/swh-git-loader load /path/to/git/repo
```
For ease:
``` {.bash}
time make cleandb run REPO_PATH=~/work/inria/repo/swh-git-cloner
```
Parse an existing repository
----------------------------
``` {.bash}
bin/swh-git-loader load /path/to/git/repo
```
Clean data
----------
This will truncate the relevant table in the schema
``` {.bash}
bin/swh-db-manager cleandb
```
For ease:
``` {.bash}
make cleandb
```
Init data
---------
``` {.bash}
make drop-db create-db
```
File Metadata
Details
Attached
Mime Type
text/plain
Expires
Wed, Jun 4, 7:19 PM (6 h, 56 m)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3282051
Attached To
rDLDG Git loader
Event Timeline
Log In to Comment