Page MenuHomeSoftware Heritage

TODO.org
No OneTemporary

TODO.org

#+title: TODO
#+author: swh team
* DONE First implementation
CLOSED: [2015-07-22 Wed 12:20]
- [X] Push on remote git repository
- [X] All git objects must be written in storage (at the moment only blobs)
- [X] Improve performance
- [X] Serialize blob's data and not blob's size.
- [X] Logging in python? How to see the log?
- [X] Replace sqlalchemy dao layer with psycopg2
- [X] Improve sgloader cli interface
- [X] Serialize sha256 as bytes
- [X] Update README.org
- [X] Switch dao layer (from sqlalchemy to psycopg2)
- [X] Serialize sha1 as bytes
- [X] Use sha1 instead of sha256 for file cache
- [X] Improve architecture
- [X] Use postgresql's bytea column for sha1
- [X] Improve git object dispatch (look up on repo object only if necessary)
- [X] Add functional test which adds new commits
- [X] Store git object on disk too
- [X] Make the compression for the file storage optional
- [X] Expose the flag to the swh-git-loader's configuration
- [X] Make the compression for the git object storage optional
- [X] Expose option flag for blob compression
- [X] Add computation folder with depth as parameter
- [X] Expose option flag for folder depth
- [X] Test coverage for at least primitives functions [2/2]
- [X] swh.file
- [X] swh.hash
- [X] Add git-sha1 function in swh.hash module
- [X] Separate the git repository parsing from the persistence (using backend api)
- [X] Enforce retrying disk writing policy
- [X] Use blob's git sha1 as key on disk
- [X] Enforce retrying policy on http client requests
- [X] Share http connection throughout the git repository parsing
* swh implementation
- [X] One content storage (2 were used, one for content, one for revision/directory)
- [ ] Improve api backend to use the `real` schema
- [ ] Adapt loader to speak the api backend the dummy way (~json)
- [ ] Improve protocol communication between loader and api backend (drop json)
- [ ] Clean up dead code (db-manager no longer useful since db init/drop in swh-sql repo)
* Global enhancement
- [ ] Drop json as api backend communication and use a simpler message protocol `<size><content-msg>` for example
- [ ] Use future computations?
* Discussion
** How to stream blob's data
Returned as raw data
** Structure log
This way they could serve for analysis by other mechanism
** Rules
**** Don't lose data
Multiple workers.
Same disks and db.
**** Transaction
Unit of transaction.
Reading if a commit exists, if not write on disk + on db.
If one disk fails, fail the transaction.
**** ?
**** Profiling
Look into the cursor implementation details.

File Metadata

Mime Type
text/plain
Expires
Sat, Jun 21, 6:09 PM (1 w, 6 d ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3311409

Event Timeline