Page MenuHomeSoftware Heritage
Feed Advanced Search

Feb 16 2021

dachary added a comment to T3054: Scale out object storage design.

For the record stats from january 2021

Feb 16 2021, 10:15 PM · Roadmap 2022, Object storage (RedHat collaboration), Roadmap 2021, meta-task
dachary added a comment to T3052: Reducing Ceph bluestore_min_alloc_size from 64K to 4K.

Description Default value of bluestore compression min blob size for rotational media.
Type Unsigned Integer
Required No
Default 128K

Feb 16 2021, 10:07 PM · Object storage
olasd added a comment to T3054: Scale out object storage design.

Here's the output of the following query, which computes exact aggregates for objects smaller than the size boundaries of the original quartiles:

Feb 16 2021, 9:46 PM · Roadmap 2022, Object storage (RedHat collaboration), Roadmap 2021, meta-task
zack updated subscribers of T3054: Scale out object storage design.

Thanks for this summary/status, very useful. Regarding goals, I think we want to have a read goal also about time to first bite, which is a performance metric which is particularly bad in the current filesystem-based object storage. Not sure what would be a reasonable goal though. Poke @olasd: any idea about a good target for this?

Feb 16 2021, 7:52 PM · Roadmap 2022, Object storage (RedHat collaboration), Roadmap 2021, meta-task
dachary added a comment to T3052: Reducing Ceph bluestore_min_alloc_size from 64K to 4K.

With a 4KB min alloc and a 4+2 erasure coded pool, objects that have a size < 16KB will require 16KB anyway + 8KB for parity. T3054 suggests that 75% of objects have a size < 16KB. Since the space amplification makes even the smallest object 16KB big, that's a total of 16KB * 7.5B = 120TB. That's 120TB / 750TB = 16% of the total. Without the space amplification these objects only use ~5% of the total space. The space amplification costs 10% of the total uncompressed storage.

Feb 16 2021, 7:52 PM · Object storage
dachary updated the task description for T3054: Scale out object storage design.
Feb 16 2021, 7:27 PM · Roadmap 2022, Object storage (RedHat collaboration), Roadmap 2021, meta-task
dachary updated the task description for T3054: Scale out object storage design.
Feb 16 2021, 7:15 PM · Roadmap 2022, Object storage (RedHat collaboration), Roadmap 2021, meta-task
dachary updated the task description for T3054: Scale out object storage design.
Feb 16 2021, 7:03 PM · Roadmap 2022, Object storage (RedHat collaboration), Roadmap 2021, meta-task
dachary updated the task description for T3054: Scale out object storage design.
Feb 16 2021, 7:01 PM · Roadmap 2022, Object storage (RedHat collaboration), Roadmap 2021, meta-task
dachary updated the task description for T3054: Scale out object storage design.
Feb 16 2021, 6:55 PM · Roadmap 2022, Object storage (RedHat collaboration), Roadmap 2021, meta-task
dachary updated the task description for T3054: Scale out object storage design.
Feb 16 2021, 6:53 PM · Roadmap 2022, Object storage (RedHat collaboration), Roadmap 2021, meta-task
dachary added a task to D398: [WIP] "packing" object storage design documentation: T3054: Scale out object storage design.
Feb 16 2021, 6:46 PM · Object storage
dachary added a revision to T3054: Scale out object storage design: D398: [WIP] "packing" object storage design documentation.
Feb 16 2021, 6:46 PM · Roadmap 2022, Object storage (RedHat collaboration), Roadmap 2021, meta-task
dachary added a parent task for T3052: Reducing Ceph bluestore_min_alloc_size from 64K to 4K: T3054: Scale out object storage design.
Feb 16 2021, 6:42 PM · Object storage
dachary added a parent task for T3048: Using a custom Sorted String Table format: T3054: Scale out object storage design.
Feb 16 2021, 6:42 PM · Object storage
dachary added subtasks for T3054: Scale out object storage design: T3052: Reducing Ceph bluestore_min_alloc_size from 64K to 4K, T3049: Benchmarking an RBD based object container, T3048: Using a custom Sorted String Table format.
Feb 16 2021, 6:42 PM · Roadmap 2022, Object storage (RedHat collaboration), Roadmap 2021, meta-task
dachary added a parent task for T3049: Benchmarking an RBD based object container: T3054: Scale out object storage design.
Feb 16 2021, 6:42 PM · Object storage
dachary changed the status of T3054: Scale out object storage design from Open to Work in Progress.
Feb 16 2021, 6:41 PM · Roadmap 2022, Object storage (RedHat collaboration), Roadmap 2021, meta-task
dachary added a project to D398: [WIP] "packing" object storage design documentation: Object storage.
Feb 16 2021, 3:22 PM · Object storage
vlorentz renamed T3049: Benchmarking an RBD based object container from Bencharmking an RBD based object container to Benchmarking an RBD based object container.
Feb 16 2021, 9:42 AM · Object storage
dachary added a comment to T3052: Reducing Ceph bluestore_min_alloc_size from 64K to 4K.

Josh Durgin gave some more pointers to relevant pull requests:

Feb 16 2021, 9:41 AM · Object storage
dachary added a comment to T3052: Reducing Ceph bluestore_min_alloc_size from 64K to 4K.

Root cause analysis for space overhead with erasure coded pools.

Feb 16 2021, 12:13 AM · Object storage

Feb 15 2021

dachary updated the task description for T3052: Reducing Ceph bluestore_min_alloc_size from 64K to 4K.
Feb 15 2021, 11:44 PM · Object storage
dachary updated the task description for T3052: Reducing Ceph bluestore_min_alloc_size from 64K to 4K.
Feb 15 2021, 11:43 PM · Object storage
dachary changed the status of T3052: Reducing Ceph bluestore_min_alloc_size from 64K to 4K from Open to Work in Progress.
Feb 15 2021, 11:42 PM · Object storage
dachary added a comment to T3014: Using an RBD image to store artifacts.

There is one concern that was not addressed: the metadata do not scale out, it is a single rocksdb database.

Feb 15 2021, 10:49 PM · Object storage
dachary closed T3051: Using EOS to store objects as Resolved.
Feb 15 2021, 10:14 PM · Object storage
dachary added a comment to T3051: Using EOS to store objects.

At first glance EOS is an entire system that adresses all the needs of the researchers at CERN. It includes an object storage with data and metadata separated, which is what the Software Heritage is likely to look like as well. However, this part is not standalone. Although it is a great source of inspiration:

Feb 15 2021, 10:14 PM · Object storage
dachary added a comment to T3051: Using EOS to store objects.

The Scalla software suite provides two fundamental building blocks: an xrootd server for low latency high bandwidth data access and an olbd server for building scalable xrootd clusters. This paper describes the architecture, how low latency is achieved, and the scaling opportunities the software allows. Actual performance measurements are presented and discussed. Scalla offers a readily deployable framework in which to construct large fault-tolerant high performance data access configurations using commodity hardware with a minimum amount of administrative overhead.

Feb 15 2021, 10:10 PM · Object storage
dachary closed T3050: Using libcephsqlite to store objects as Resolved.
Feb 15 2021, 9:41 PM · Object storage
dachary added a comment to T3050: Using libcephsqlite to store objects.

There is a hard limit on the sqlite database (~280TB) so it would not work, even if perfectly optimized.

Feb 15 2021, 9:41 PM · Object storage
dachary changed the status of T3051: Using EOS to store objects from Open to Work in Progress.
Feb 15 2021, 9:36 PM · Object storage
dachary updated the task description for T3050: Using libcephsqlite to store objects.
Feb 15 2021, 9:05 PM · Object storage
dachary added a comment to T3050: Using libcephsqlite to store objects.

A mail was sent to Patrick Donnelly to ask for his opinion on the matter.

Feb 15 2021, 9:03 PM · Object storage
dachary triaged T3050: Using libcephsqlite to store objects as Normal priority.
Feb 15 2021, 9:03 PM · Object storage
dachary changed the status of T3050: Using libcephsqlite to store objects from Open to Work in Progress.
Feb 15 2021, 9:02 PM · Object storage
dachary closed T3014: Using an RBD image to store artifacts as Resolved.
Feb 15 2021, 9:00 PM · Object storage
dachary closed T3014: Using an RBD image to store artifacts, a subtask of T3049: Benchmarking an RBD based object container, as Resolved.
Feb 15 2021, 9:00 PM · Object storage
dachary added a comment to T3014: Using an RBD image to store artifacts.

This preliminary exploration is complete and moved to benchmarking to discover blockers.

Feb 15 2021, 8:59 PM · Object storage
dachary added a parent task for T3014: Using an RBD image to store artifacts: T3049: Benchmarking an RBD based object container.
Feb 15 2021, 8:58 PM · Object storage
dachary added a subtask for T3049: Benchmarking an RBD based object container: T3014: Using an RBD image to store artifacts.
Feb 15 2021, 8:58 PM · Object storage
dachary updated the task description for T3049: Benchmarking an RBD based object container.
Feb 15 2021, 8:51 PM · Object storage
dachary updated the task description for T3048: Using a custom Sorted String Table format.
Feb 15 2021, 8:09 PM · Object storage
dachary added a comment to T3048: Using a custom Sorted String Table format.

Updated the description, even simpler.

Feb 15 2021, 8:09 PM · Object storage
dachary updated the task description for T3049: Benchmarking an RBD based object container.
Feb 15 2021, 8:08 PM · Object storage
dachary added a comment to T3049: Benchmarking an RBD based object container.

Thanks for the comment. Let's keep just the SWHID then.

Feb 15 2021, 8:07 PM · Object storage
zack updated the task description for T3049: Benchmarking an RBD based object container.
Feb 15 2021, 7:54 PM · Object storage
zack added a comment to T3048: Using a custom Sorted String Table format.

followed sequence of:

Size of SHA256, SWHID, Content
SHA256
SWHID
Content

Feb 15 2021, 7:53 PM · Object storage
zack added a comment to T3049: Benchmarking an RBD based object container.

The object storage is a collection of RBD images containing a sequence of objects (SHA256 + SWHID + content).

Feb 15 2021, 7:50 PM · Object storage
dachary updated the task description for T3049: Benchmarking an RBD based object container.
Feb 15 2021, 6:19 PM · Object storage
dachary changed the status of T3049: Benchmarking an RBD based object container from Open to Work in Progress.
Feb 15 2021, 6:16 PM · Object storage
dachary updated the task description for T3048: Using a custom Sorted String Table format.
Feb 15 2021, 6:04 PM · Object storage
dachary renamed T3048: Using a custom Sorted String Table format from Using custom format for 1TB archive to Using a custom format for 1TB archive.
Feb 15 2021, 5:43 PM · Object storage
dachary added a parent task for T3046: Using xz-file-format for 1TB archive: T3048: Using a custom Sorted String Table format.
Feb 15 2021, 5:43 PM · Object storage
dachary added subtasks for T3048: Using a custom Sorted String Table format: T3046: Using xz-file-format for 1TB archive, T3045: Using pixz for 1TB archives.
Feb 15 2021, 5:43 PM · Object storage
dachary added a parent task for T3045: Using pixz for 1TB archives: T3048: Using a custom Sorted String Table format.
Feb 15 2021, 5:43 PM · Object storage
dachary changed the status of T3048: Using a custom Sorted String Table format from Open to Work in Progress.
Feb 15 2021, 5:41 PM · Object storage
dachary closed T3046: Using xz-file-format for 1TB archive as Resolved.
Feb 15 2021, 5:37 PM · Object storage
dachary added a comment to T3046: Using xz-file-format for 1TB archive.

Although simple and close to what is needed, Xz is not an exact match: the index would need to be maintained.

Feb 15 2021, 5:37 PM · Object storage
dachary added a comment to T3046: Using xz-file-format for 1TB archive.

Xz format inadequate for long-term archiving

Feb 15 2021, 5:35 PM · Object storage
dachary changed the status of T3014: Using an RBD image to store artifacts from Open to Work in Progress.
Feb 15 2021, 2:13 PM · Object storage
dachary added a comment to T3046: Using xz-file-format for 1TB archive.

The zstd format is tightly associated with the compression algorithm and is therefore more complex. It can however be a sequence of independently compressed content and could be used for the same purpose as xz.

Feb 15 2021, 10:42 AM · Object storage
dachary added a comment to T3046: Using xz-file-format for 1TB archive.

The 7z format is more complex because it knows about files, directories etc. It is not not just a compressed data format.

Feb 15 2021, 9:56 AM · Object storage
dachary changed the status of T3046: Using xz-file-format for 1TB archive from Open to Work in Progress.
Feb 15 2021, 9:36 AM · Object storage
dachary closed T3045: Using pixz for 1TB archives as Resolved.
Feb 15 2021, 9:28 AM · Object storage
dachary added a comment to T3045: Using pixz for 1TB archives.

There are two blockers:

Feb 15 2021, 9:28 AM · Object storage
dachary added a comment to T3045: Using pixz for 1TB archives.

When extracting a single file (-x file) the in memory index is walked sequentially looking for the file.

Feb 15 2021, 9:20 AM · Object storage
dachary updated the task description for T3045: Using pixz for 1TB archives.
Feb 15 2021, 9:02 AM · Object storage
dachary added a comment to T3045: Using pixz for 1TB archives.

XZ file format https://tukaani.org/xz/format.html & https://tukaani.org/xz/xz-file-format.txt

Feb 15 2021, 8:54 AM · Object storage
dachary added a comment to T3045: Using pixz for 1TB archives.

The index is located at the end of the file.
The content of the archive is compressed as successive blocs of a given size.
The index is compressed as a single block of unlimited size.

Feb 15 2021, 8:54 AM · Object storage
dachary triaged T3045: Using pixz for 1TB archives as Normal priority.
Feb 15 2021, 8:49 AM · Object storage

Feb 14 2021

dachary added a comment to T3014: Using an RBD image to store artifacts.
Feb 14 2021, 7:11 PM · Object storage
dachary added a comment to T2216: Packing object storage.

About Ceph RGW and the lack of packing https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/AEMW6O7WVJFMUIX7QGI2KM7HKDSTNIYT/

Feb 14 2021, 6:24 PM · Object storage, Roadmap 2020
dachary added a comment to T3014: Using an RBD image to store artifacts.

https://github.com/vasi/pixz is a candidate for the 1TB archive content

Feb 14 2021, 5:29 PM · Object storage
dachary added a watcher for Object storage: dachary.
Feb 14 2021, 10:20 AM
dachary added a comment to T3014: Using an RBD image to store artifacts.

For the record yesterday's IRC log

Feb 14 2021, 9:27 AM · Object storage

Feb 13 2021

dachary added a comment to T3014: Using an RBD image to store artifacts.

For the record, today's IRC log:

Feb 13 2021, 12:21 AM · Object storage

Feb 6 2021

dachary added a comment to T3014: Using an RBD image to store artifacts.

Benchmarking S3 in Ceph with COSBench could be interesting (the video is not yet available). In the past COSBench was difficult to use but maybe it improved. This is off-topic though, but I don't know where to write that down at the moment.

Feb 6 2021, 2:03 PM · Object storage

Feb 4 2021

dachary updated the task description for T3014: Using an RBD image to store artifacts.
Feb 4 2021, 12:38 PM · Object storage

Feb 3 2021

zack triaged T3014: Using an RBD image to store artifacts as Normal priority.
Feb 3 2021, 3:23 PM · Object storage

Feb 2 2021

dachary updated the task description for T3014: Using an RBD image to store artifacts.
Feb 2 2021, 9:36 PM · Object storage

Feb 1 2021

dachary updated the task description for T3014: Using an RBD image to store artifacts.
Feb 1 2021, 9:56 PM · Object storage
dachary added a comment to T3014: Using an RBD image to store artifacts.

A trivial test case (attached) shows that an RBD image backed by a k=4,m=2 erasure coded pool (RAID6 equivalent) can store 4GB of data using 6GB of disk. The metadata overhead is small. It would be great if someone could repeat the test to make sure I did not accidentally obtained these results.

Feb 1 2021, 9:16 PM · Object storage
dachary created T3014: Using an RBD image to store artifacts.
Feb 1 2021, 9:10 PM · Object storage

Jan 4 2021

tenma lowered the priority of T2743: Investigate why logging is inconsistent from Normal to Low.
Jan 4 2021, 5:00 PM · Vault, Object storage, Core & foundations

Nov 3 2020

ardumont moved T1533: Make sure api server uses explicit configurations from Backlog to Archived on the SWORD deposit board.
Nov 3 2020, 4:04 PM · Scheduling utilities, Web app, SWORD deposit, Object storage, Storage manager, Vault
tenma updated the task description for T2743: Investigate why logging is inconsistent.
Nov 3 2020, 10:45 AM · Vault, Object storage, Core & foundations
tenma updated the task description for T2743: Investigate why logging is inconsistent.
Nov 3 2020, 10:30 AM · Vault, Object storage, Core & foundations
tenma updated the task description for T2743: Investigate why logging is inconsistent.
Nov 3 2020, 10:23 AM · Vault, Object storage, Core & foundations

Oct 30 2020

tenma updated the task description for T2743: Investigate why logging is inconsistent.
Oct 30 2020, 2:47 PM · Vault, Object storage, Core & foundations
tenma updated the task description for T2743: Investigate why logging is inconsistent.
Oct 30 2020, 2:46 PM · Vault, Object storage, Core & foundations

Oct 29 2020

tenma triaged T2743: Investigate why logging is inconsistent as Normal priority.
Oct 29 2020, 6:19 PM · Vault, Object storage, Core & foundations

Oct 26 2020

douardda added a comment to T1577: Compare/benchmark objstorage backends .

See also T2706

Oct 26 2020, 12:30 PM · Object storage

Oct 16 2020

douardda added a comment to T2706: Benchmark objstorage for mirror (uffizi vs. azure vs. s3).

Same as before but with 1M (fresh) sha1s:

Oct 16 2020, 1:02 PM · Object storage, Mirror
douardda added a comment to T2706: Benchmark objstorage for mirror (uffizi vs. azure vs. s3).

Since the results on uffizi above did suffer from a few caveats, I've made a few more tests:

  • a first result has been obtained with a dataset that had only objects stored on the XFS part of the objstorage
  • a second dataset has been created (with the order by sha256 part to spread the sha1s)
  • but results are a mix hot/cold cache tests
Oct 16 2020, 11:59 AM · Object storage, Mirror

Oct 15 2020

douardda added a comment to T2706: Benchmark objstorage for mirror (uffizi vs. azure vs. s3).

Some results:

Oct 15 2020, 1:02 PM · Object storage, Mirror
zack added projects to T2706: Benchmark objstorage for mirror (uffizi vs. azure vs. s3): Mirror, Object storage.
Oct 15 2020, 12:44 PM · Object storage, Mirror

Sep 22 2020

olasd closed T1043: handle the uffizi content store being full as Resolved.

This was, in fact, solved by adding more storage.

Sep 22 2020, 4:43 PM · Object storage
olasd closed T1048: Clean striped object storages from objects they should not be containing, a subtask of T1043: handle the uffizi content store being full, as Resolved.
Sep 22 2020, 4:42 PM · Object storage
olasd closed T1048: Clean striped object storages from objects they should not be containing, a subtask of T1044: Write all contents synchronously to azure, as Resolved.
Sep 22 2020, 4:42 PM · Object storage