Page MenuHomeSoftware Heritage

Scale out object storage design
Closed, MigratedEdits Locked

Related Objects

StatusAssignedTask
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I have found some interesting pointers relative to the management of small files in hdfs (found them when looking for unrelated other stuff). Is it something you have identified and excluded from the scope due to some blockers ?

Very interesting to see how this problem was presented & solved in the Hadoop ecosystem, thanks for the links.

  • HAR (Hadoop Archives) were designed to reduce the HDFS space amplification for small objects (no specifics on how much amplification that is exactly). This 2009 article gives a hint: every HDFS file requires 150 bytes in RAM and it states that "Certainly a billion files is not feasible." and that explains why HAR was introduced.
dachary updated the task description. (Show Details)
rdicosmo moved this task from Backlog to Done on the Roadmap 2021 board.
rdicosmo moved this task from Done to Work in progress on the Roadmap 2021 board.
rdicosmo added a subscriber: rdicosmo.

Thanks for helping with the labelling @rdicosmo 👍

dachary updated the task description. (Show Details)
dachary updated the task description. (Show Details)
dachary updated the task description. (Show Details)
dachary updated the task description. (Show Details)
dachary changed the status of subtask T3249: Deleting and erasing an object from Work in Progress to Open.Aug 29 2021, 1:05 PM
bchauvet raised the priority of this task from Normal to High.Mar 25 2022, 5:28 PM
gitlab-migration changed the status of subtask T3050: Using libcephsqlite to store objects from Wontfix to Migrated.
gitlab-migration changed the status of subtask T3051: Using EOS to store objects from Resolved to Migrated.
gitlab-migration changed the status of subtask T3057: Using seaweedfs to store objects from Resolved to Migrated.
gitlab-migration changed the status of subtask T3064: Using ambry to store objects from Invalid to Migrated.
gitlab-migration changed the status of subtask T3065: Using git to store objects from Wontfix to Migrated.
gitlab-migration changed the status of subtask T3066: Using RocksDB SST as a file format from Wontfix to Migrated.
gitlab-migration changed the status of subtask T3068: Using Sorted String Tables as a file format from Wontfix to Migrated.
gitlab-migration changed the status of subtask T3088: Object storage index with redis from Invalid to Migrated.
gitlab-migration changed the status of subtask T3106: TCO of a production ready Ambry vs implementing the design from Resolved to Migrated.
gitlab-migration changed the status of subtask T3107: Scale out object storage explorations from Resolved to Migrated.
gitlab-migration changed the status of subtask T3108: Grid5000 for benchmarking from Resolved to Migrated.
gitlab-migration changed the status of subtask T3149: Benchmark software for the object storage from Resolved to Migrated.
gitlab-migration changed the status of subtask T3186: Ceph Sepia lab for performance testing from Wontfix to Migrated.
gitlab-migration changed the status of subtask T3210: Ceph Quincy CDS & immutable objects from Resolved to Migrated.
gitlab-migration changed the status of subtask T3421: Running the benchmarks: July 16th, 2 days from Resolved to Migrated.
gitlab-migration changed the status of subtask T3422: Running the benchmarks: August 6th, 2021, 9 days from Resolved to Migrated.