diff --git a/README b/README index 7776619..fd46b79 100644 --- a/README +++ b/README @@ -1,229 +1,229 @@ The Software Heritage Git Loader is a tool and a library to walk a local Git repository and inject into the SWH dataset all contained files that weren't known before. License ======= This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. See top-level LICENSE file for the full text of the GNU General Public License along with this program. Dependencies ============ Runtime ------- - python3 - python3-pygit2 - python3-psycopg2 - python3-flask - python3-requests - python3-retrying Test ---- - python3-nose Requirements ============ - implementation language, Python3 - coding guidelines: conform to PEP8 - Git access: via libgit2/pygit - cache: implemented as Postgres tables Configuration ============= swh-loader-git depends on some tools, here are the configuration files for those: swh-db-manager -------------- This is solely a db cleanup tool (which will fade away) Create a configuration file in **~/.config/db-manager.ini** -```ini +``` [main] # Where to store the logs log_dir = swh-loader-git/log # url access to db db_url = dbname=swhgitloader ``` See for the db url's schema. swh-loader-git -------------- The loader, which declines in 2 forms: - one client which parses and loads directly to swh's backend (db + storage). - one part client which parses teh repository and load to swh's remote server: ## local Create a configuration file in **~/.config/swh/loader-git.ini**: -```ini +``` [main] # Where to store the logs log_dir = /tmp/swh-loader-git/log # how to access the backend (remote or local) backend-type = local # backend-type local: configuration file to backend file .ini (cf. back.ini file) backend = ~/.config/swh/back.ini ``` Note: See swh-backend's configuration file. ## remote Create a configuration file in **~/.config/swh/loader-git.ini**: -```ini +``` [main] # Where to store the logs log_dir = /tmp/swh-loader-git/log # how to access the backend (remote or local) backend-type = remote # backend-type remote: url access to api rest's backend backend = http://localhost:5000 ``` Note: - [DB url DSL](http://initd.org/psycopg/docs/module.html#psycopg2.connect) - the configuration file can be changed in the CLI with the flag `-c ` or `--config-file ` swh-backend ----------- Backend api. This Create a configuration file in **~/.config/swh/back.ini**: -```ini +``` [main] # where to store blob on disk content_storage_dir = /tmp/swh-loader-git/content-storage # Where to store the logs log_dir = swh-loader-git/log # url access to db: dbname= (host= port= user= password=) db_url = dbname=swhgitloader # compute folder's depth on disk aa/bb/cc/dd # folder_depth = 2 # To open to the world, 0.0.0.0 #host = 127.0.0.1 # Debugger (for dev only) debug = true # server port to listen to requests port = 6000 ``` See for the db url's schema Run === Environment initialization -------------------------- The PYTHONPATH must be set adequately. The tools depends on: - swh-environment/swh-core - swh-environment/swh-storage Note: see swh-environment/pythonpath.sh Help ---- -```bash +``` bin/swh-backend --help bin/swh-loader-git --help bin/swh-db-manager --help ``` Backend ------- The backend server depends on the object storage and the db. The db depends on the actual sql schema defined in swh-environment/swh-storage/sql/*.sql. ### With initialization The Makefile.local usually is a good place to start with: -```bash +``` make create-db (DB=softwareheritage-dev) ``` Note: - This delegates to the `swh-storage/sql/`'s Makefile the creation. - Between parenthesis, the optional and default values. Override to use according to your needs. ### without initialization Running the backend. -```bash +``` ./bin/swh-backend -v (--config ~/.config/swh/back.ini) ``` Note: Between parenthesis, the optional and default values. Override to use according to your needs. With makefile: -```bash +``` make run-back FOLLOW_LOG=-f ``` Parse a repository: ------------------- Parse and load the repository /path/to/git/repo: -```bash +``` bin/swh-loader-git -c (~/.config/swh/git-loader.ini) load /path/to/git/repo ``` Scratch data ------------ -```bash +``` make drop-db create-db ``` Note: It only deal with the db and not with the fs.