Page MenuHomeSoftware Heritage

Performance tuning of zfs infrastructure
Closed, MigratedEdits Locked

Description

Our zfs deployment currently only uses rotative storage.

This is (very) suboptimal, and we get noticeable load issues and poor performance, for instance when doing I/O intensive tasks on the annex (the most glaring one being the nfs server becoming slow/unresponsive, which triggers icinga alerts on the reverse proxy).

This task will survey the current best practices for using faster storage in the performance-sensitive parts of ZFS (SLOG, L2ARC, ...) and come up with a few quotes to implement them.

Candidates for actual faster storage:

  • Intel Optane storage
  • PCIe NVMe storage
  • PCIe M.2 adapter + M.2 NVMe disks
  • SAS SSDs

References

About ZFS in general

About the ZFS Intent Log (ZIL), Separate/Secondary Intent Log (SLOG), and zfs sync flags

About SSDs with power loss protection

Event Timeline

olasd triaged this task as High priority.Aug 20 2019, 11:53 AM
olasd created this task.
olasd changed the task status from Open to Work in Progress.Aug 20 2019, 12:30 PM
olasd updated the task description. (Show Details)
olasd updated the task description. (Show Details)

have you made you mind on the storage model you want for this? Can we purchase a few of them any time soon?

Interesting feedback on this post: https://www.ixsystems.com/community/threads/slog-benchmarking-and-finding-the-best-slog.63521/page-10#post-524412

This person is using their ZFS as a small file storage with mostly static files, which is quite similar to what we'll be doing.

  • their SLOG is an Intel Optane drive (P4801x)
  • they use a metadata-only L2ARC cache on a consumer-grade Samsung 960 evo SSD disk

Both these drives use the m.2 form factor, for which we don't have any ports on uffizi currently.

I think we should go ahead with:

  • one HHHL (PCIe card) Intel Optane P4800x (The smallest, 375GB version will be sufficient); should be around 1200 EUR;
  • one PCIe card with M.2 slots (Asus Hyper M.2 x16 or equivalent); I've found prices between 100 and 400 EUR;
  • two consumer-grade NVMe disks for L2ARC cache (1 or 2TB each; e.g. Samsung 970 Evo); the 1TB drives retail for 200 EUR each.

I'll try to fish for a quote with known suppliers.

In T1958#36628, @olasd wrote:
  • one HHHL (PCIe card) Intel Optane P4800x (The smallest, 375GB version will be sufficient); should be around 1200 EUR;
  • one PCIe card with M.2 slots (Asus Hyper M.2 x16 or equivalent); I've found prices between 100 and 400 EUR;
  • two consumer-grade NVMe disks for L2ARC cache (1 or 2TB each; e.g. Samsung 970 Evo); the 1TB drives retail for 200 EUR each.

I'll try to fish for a quote with known suppliers.

The last 2 items (even together) can be directly bought asking reimbursement later—I can do it, if you need to speed things up.
The first item unfortunately is outside the maximum cost limit for that (800€).

In T1958#36629, @zack wrote:
In T1958#36628, @olasd wrote:
  • one HHHL (PCIe card) Intel Optane P4800x (The smallest, 375GB version will be sufficient); should be around 1200 EUR;
  • one PCIe card with M.2 slots (Asus Hyper M.2 x16 or equivalent); I've found prices between 100 and 400 EUR;
  • two consumer-grade NVMe disks for L2ARC cache (1 or 2TB each; e.g. Samsung 970 Evo); the 1TB drives retail for 200 EUR each.

I'll try to fish for a quote with known suppliers.

The last 2 items (even together) can be directly bought asking reimbursement later—I can do it, if you need to speed things up.
The first item unfortunately is outside the maximum cost limit for that (800€).

I've obtained a quote for the Optane card, which has been approved by Roberto and I've sent to Vanessa for purchasing. I've also forwarded the proper references for the m.2 disks and adapter card (totalling around 700€) to Roberto, which we'll handle together on Monday.

The aforementioned cards have been received, installed and zfs has been setup to use them.

In the meantime, most of the data in /srv/storage/space has been moved to zfs.

The primary objstorage on uffizi has been rebased onto a zfs dataset, and the workers have been restarted; The legacy xfs objstorage has been remounted read-only.

I've added a "data rates" panel to the storage backends grafana dashboard : https://grafana.softwareheritage.org/d/8ywqc76mk/storage-backend-statistics, which will allow us to compare before/after performance.

I think the bulk of the task here is done. We can open another task if we notice any specific pain point later.

In T1958#38009, @olasd wrote:

I think the bulk of the task here is done. We can open another task if we notice any specific pain point later.

Sure.

What's the bottom line in terms of performance comparison between the new ZFS based deployment and the previous one on old disk arrays?