Page Menu
Home
Software Heritage
Search
Configure Global Search
Log In
Files
F9123801
TODO.org
No One
Temporary
Actions
View File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Flag For Later
Size
2 KB
Subscribers
None
TODO.org
View Options
#+title: TODO
#+author: swh team
* DONE First implementation
CLOSED: [2015-07-22 Wed 12:20]
- [X] Push on remote git repository
- [X] All git objects must be written in storage (at the moment only blobs)
- [X] Improve performance
- [X] Serialize blob's data and not blob's size.
- [X] Logging in python? How to see the log?
- [X] Replace sqlalchemy dao layer with psycopg2
- [X] Improve sgloader cli interface
- [X] Serialize sha256 as bytes
- [X] Update README.org
- [X] Switch dao layer (from sqlalchemy to psycopg2)
- [X] Serialize sha1 as bytes
- [X] Use sha1 instead of sha256 for file cache
- [X] Improve architecture
- [X] Use postgresql's bytea column for sha1
- [X] Improve git object dispatch (look up on repo object only if necessary)
- [X] Add functional test which adds new commits
- [X] Store git object on disk too
- [X] Make the compression for the file storage optional
- [X] Expose the flag to the swh-git-loader's configuration
- [X] Make the compression for the git object storage optional
- [X] Expose option flag for blob compression
- [X] Add computation folder with depth as parameter
- [X] Expose option flag for folder depth
- [X] Test coverage for at least primitives functions [2/2]
- [X] swh.file
- [X] swh.hash
- [X] Add git-sha1 function in swh.hash module
- [X] Separate the git repository parsing from the persistence (using backend api)
- [X] Enforce retrying disk writing policy
- [X] Use blob's git sha1 as key on disk
- [X] Enforce retrying policy on http client requests
- [X] Share http connection throughout the git repository parsing
* swh implementation
- [X] One content storage (2 were used, one for content, one for revision/directory)
- [ ] Improve api backend to use the `real` schema
- [ ] Adapt loader to speak the api backend the dummy way (~json)
- [ ] Improve protocol communication between loader and api backend (drop json)
- [ ] Clean up dead code (db-manager no longer useful since db init/drop in swh-sql repo)
* Global enhancement
- [ ] Drop json as api backend communication and use a simpler message protocol `<size><content-msg>` for example
- [ ] Use future computations?
* Discussion
** How to stream blob's data
Returned as raw data
** Structure log
This way they could serve for analysis by other mechanism
** Rules
**** Don't lose data
Multiple workers.
Same disks and db.
**** Transaction
Unit of transaction.
Reading if a commit exists, if not write on disk + on db.
If one disk fails, fail the transaction.
**** ?
**** Profiling
Look into the cursor implementation details.
File Metadata
Details
Attached
Mime Type
text/plain
Expires
Sat, Jun 21, 6:09 PM (1 w, 6 d ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3311409
Attached To
rDLDG Git loader
Event Timeline
Log In to Comment