Using ambry to store objects
Closed, MigratedEdits Locked
Actions

Assigned To

Authored By

	dachary
	Feb 20 2021, 1:38 PM

Description

ambry is a distributed object store that supports storage of trillion of small immutable objects (50K -100K) as well as billions of large objects. It was specifically designed to store and serve media objects in web companies. However, it can be used as a general purpose storage system to store DB backups, search indexes or business reports.

A 2014 paper describes the design of ambry. The 2016 announcement published when the code was released as Free Software also contains a detailed description of the architecture.

Blog entries related to storage & ambry on the Linkedin engineering web site.

Related Objects
Search...

Status	Assigned	Task
Migrated	gitlab-migration	T3116 Roll out at least one operational mirror
Migrated	gitlab-migration	T3054 Scale out object storage design
Migrated	gitlab-migration	T3064 Using ambry to store objects

Event Timeline

dachary changed the task status from Open to Work in Progress.Feb 20 2021, 1:38 PM

dachary triaged this task as Normal priority.

dachary created this task.

dachary created this object in space S1 Public.

dachary added a parent task: T3054: Scale out object storage design.

dachary mentioned this in T3054: Scale out object storage design.

dachary updated the task description. (Show Details)Feb 21 2021, 11:59 AM

Open sourcing DataHub: LinkedIn’s metadata search and discovery platform explains how developers work on DataHub and the relationship between code internal to Linkedin and what is published as Free Software. It is not about ambry and maybe the ambry team has a completely different behavior. A similar article about ambry is dated 2016:

Open Source is the way to go

We are very happy to announce that we have open sourced Ambry (https://github.com/linkedin/ambry). We believe this system will be immensely useful to the community to support real-time upload and serving of media content. We have detailed documentation available at https://github.com/linkedin/ambry/wiki and we hope to constantly improve them. Your feedback will be really useful to make this project better. If you are interested in contributing please look at https://github.com/linkedin/ambry/wiki/Development%20Guidelines. Please reach out to us if you have any issues using Ambry or questions w.r.t contributing to the codebase at ambrydev@googlegroups.com.

The last entry in the mailing list is dated 2017.

dachary updated the task description. (Show Details)Feb 21 2021, 12:13 PM

Readonly partitions are stored in Sorted String Table format.

Ambry has been a great source of inspiration and the best fit for the software heritage use case. Including the partition UUID in the object takes advantage of the immutability of the objects allows all readers to have a scale out object storage.

Regarding the software itself:

needs packaging
the operation documentation needs to be written
the user community has been silent for years and should be revived
contributions from outside linkedin are not commonplace

It seems to be less work to T3054 implement the proposed design on top of Ceph instead

dachary closed this task as Invalid.Feb 22 2021, 12:25 AM

dachary mentioned this in T3107: Scale out object storage explorations.Mar 9 2021, 1:09 PM

This task has been migrated to GitLab.

Using ambry to store objectsClosed, MigratedEdits LockedActions

Description

Related ObjectsSearch...

Event Timeline

Using ambry to store objects
Closed, MigratedEdits Locked
Actions

Related Objects
Search...