Page MenuHomeSoftware Heritage

Scale out object storage design
Started, Work in Progress, NormalPublic

Description

[Parent task for all related tasks]

Current status

An object storage design was described and is discussed . Benchmarks are written and run T3149 to verify it is efficient (space and speed) for the intended use cases.

Design

Discussions

Related Objects

StatusAssignedTask
Opendouardda
Work in Progressdachary
Work in Progressdachary
Resolveddachary
Resolveddachary
Work in Progressdachary
Wontfixdachary
Invaliddachary
Invaliddachary
Resolveddachary
Resolveddachary
Resolveddachary
Wontfixdachary
Invaliddachary
Wontfixdachary
Work in Progressdachary
Work in Progressdachary
Invaliddachary
Work in Progressdachary
Opendachary
Resolveddachary
Resolveddachary
Resolveddachary
Work in Progressdachary
Work in Progressdachary
Resolveddachary
Work in Progressdachary

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
dachary updated the task description. (Show Details)
dachary updated the task description. (Show Details)
dachary updated the task description. (Show Details)

For the record the half baked benchmark script for the proposed designed I worked on today. To be continued!


dachary updated the task description. (Show Details)

I have found some interesting pointers relative to the management of small files in hdfs (found them when looking for unrelated other stuff). Is it something you have identified and excluded from the scope due to some blockers ?

Very interesting to see how this problem was presented & solved in the Hadoop ecosystem, thanks for the links.

  • HAR (Hadoop Archives) were designed to reduce the HDFS space amplification for small objects (no specifics on how much amplification that is exactly). This 2009 article gives a hint: every HDFS file requires 150 bytes in RAM and it states that "Certainly a billion files is not feasible." and that explains why HAR was introduced.
dachary updated the task description. (Show Details)
rdicosmo moved this task from Backlog to Done on the Roadmap 2021 board.
rdicosmo moved this task from Done to Work in progress on the Roadmap 2021 board.
rdicosmo added a subscriber: rdicosmo.

Thanks for helping with the labelling @rdicosmo 👍

dachary updated the task description. (Show Details)