Page MenuHomeSoftware Heritage

Improve the seaweedfs backend
Closed, MigratedEdits Locked

Description

The current implementation of the seaweedfs backend is pretty simple and naive.

It uses the Filer API to interact with the seaweedfs cluster.

By default, the Filer server uses an leveldb index to store the path<->object ID relations, which has good properties, but lack the distributed one.
However there is a replication mechanism available.

Nonetheless, a better production setup would probably use a redis backend, since it's one of the 2 that supports the Super Large Directories feature.

See the list of supported backends for more details.

The redis backend used to have a limitation on the size of a directory, which would be a problem for our usecase, so a possible solution was to use the pathslicer on the seaswwedfs backend.

However this limitation seems to have been released with the redis3 backend.

This task tries to keep track of needed improvements of the seaweedfs objtorage backend:

  • use the pathslicer to mitigate possible directory size limitations of some Filer backends (D6442 and D6492); this is probably not needed any more (at least for now)
  • better tests (current tests bypass too much layers of the objstorage<->Filer communication) (part of D6517)
  • add support for multiple Filer servers

Using the pathslicer is probably not needed, so let's not get it done for the moment.
Best approach for multiple Filer servers is simply use a load balancer in front of the Filers, so let's not add code for that.

Event Timeline

douardda triaged this task as Normal priority.Oct 18 2021, 3:33 PM
douardda created this task.
douardda claimed this task.
douardda updated the task description. (Show Details)

Let's consider this is done since 2 of 3 bullets have been canceled (for now)