Page MenuHomeSoftware Heritage

Using ambry to store objects
Closed, InvalidPublic

Description

ambry is a distributed object store that supports storage of trillion of small immutable objects (50K -100K) as well as billions of large objects. It was specifically designed to store and serve media objects in web companies. However, it can be used as a general purpose storage system to store DB backups, search indexes or business reports.

A 2014 paper describes the design of ambry. The 2016 announcement published when the code was released as Free Software also contains a detailed description of the architecture.

Blog entries related to storage & ambry on the Linkedin engineering web site.

Event Timeline

dachary changed the task status from Open to Work in Progress.Feb 20 2021, 1:38 PM
dachary triaged this task as Normal priority.
dachary created this task.
dachary created this object in space S1 Public.

Open sourcing DataHub: LinkedIn’s metadata search and discovery platform explains how developers work on DataHub and the relationship between code internal to Linkedin and what is published as Free Software. It is not about ambry and maybe the ambry team has a completely different behavior. A similar article about ambry is dated 2016:

Open Source is the way to go

We are very happy to announce that we have open sourced Ambry (https://github.com/linkedin/ambry). We believe this system will be immensely useful to the community to support real-time upload and serving of media content. We have detailed documentation available at https://github.com/linkedin/ambry/wiki and we hope to constantly improve them. Your feedback will be really useful to make this project better. If you are interested in contributing please look at https://github.com/linkedin/ambry/wiki/Development%20Guidelines. Please reach out to us if you have any issues using Ambry or questions w.r.t contributing to the codebase at ambrydev@googlegroups.com.

The last entry in the mailing list is dated 2017.

Readonly partitions are stored in Sorted String Table format.

Ambry has been a great source of inspiration and the best fit for the software heritage use case. Including the partition UUID in the object takes advantage of the immutability of the objects allows all readers to have a scale out object storage.

Regarding the software itself:

  • needs packaging
  • the operation documentation needs to be written
  • the user community has been silent for years and should be revived
  • contributions from outside linkedin are not commonplace

It seems to be less work to T3054 implement the proposed design on top of Ceph instead