#+title: TODO #+author: swh team * DONE First implementation CLOSED: [2015-07-22 Wed 12:20] - [X] Push on remote git repository - [X] All git objects must be written in storage (at the moment only blobs) - [X] Improve performance - [X] Serialize blob's data and not blob's size. - [X] Logging in python? How to see the log? - [X] Replace sqlalchemy dao layer with psycopg2 - [X] Improve sgloader cli interface - [X] Serialize sha256 as bytes - [X] Update README.org - [X] Switch dao layer (from sqlalchemy to psycopg2) - [X] Serialize sha1 as bytes - [X] Use sha1 instead of sha256 for file cache - [X] Improve architecture - [X] Use postgresql's bytea column for sha1 - [X] Improve git object dispatch (look up on repo object only if necessary) - [X] Add functional test which adds new commits - [X] Store git object on disk too - [X] Make the compression for the file storage optional - [X] Expose the flag to the swh-git-loader's configuration - [X] Make the compression for the git object storage optional - [X] Expose option flag for blob compression - [X] Add computation folder with depth as parameter - [X] Expose option flag for folder depth - [X] Test coverage for at least primitives functions [2/2] - [X] swh.file - [X] swh.hash - [X] Add git-sha1 function in swh.hash module - [X] Separate the git repository parsing from the persistence (using backend api) - [X] Enforce retrying disk writing policy - [X] Use blob's git sha1 as key on disk - [X] Enforce retrying policy on http client requests - [X] Share http connection throughout the git repository parsing * swh implementation - [X] One content storage (2 were used, one for content, one for revision/directory) - [ ] Improve api backend to use the `real` schema - [ ] Adapt loader to speak the api backend the dummy way (~json) - [ ] Improve protocol communication between loader and api backend (drop json) - [ ] Clean up dead code (db-manager no longer useful since db init/drop in swh-sql repo) * Global enhancement - [ ] Drop json as api backend communication and use a simpler message protocol `` for example - [ ] Use future computations? * Discussion ** How to stream blob's data Returned as raw data ** Structure log This way they could serve for analysis by other mechanism ** Rules **** Don't lose data Multiple workers. Same disks and db. **** Transaction Unit of transaction. Reading if a commit exists, if not write on disk + on db. If one disk fails, fail the transaction. **** ? **** Profiling Look into the cursor implementation details.