Using xz-file-format for 1TB archive
Closed, MigratedEdits Locked
Actions

Assigned To

Authored By

	dachary
	Feb 15 2021, 9:36 AM

Description

How practical would it really be to use xz-file-format to create a 1TB archive with artifacts that have 3KB median size and 80KB average size ?

Each artifact would be a individually compressed block
An uncompressed index sorted index with fixed record sizes SHA256 => number of block is stored in the last block

The index does not fit in memory and needs sorting.

Related Objects
Search...

Status	Assigned	Task
Migrated	gitlab-migration	T3116 Roll out at least one operational mirror
Migrated	gitlab-migration	T3054 Scale out object storage design
Migrated	gitlab-migration	T3048 Using a custom Sorted String Table format
Migrated	gitlab-migration	T3046 Using xz-file-format for 1TB archive

Event Timeline

dachary changed the task status from Open to Work in Progress.Feb 15 2021, 9:36 AM

dachary triaged this task as Normal priority.

dachary created this task.

dachary created this object in space S1 Public.

https://py7zr.readthedocs.io/en/latest/archive_format.html

The 7z format is more complex because it knows about files, directories etc. It is not not just a compressed data format.

https://github.com/facebook/zstd/blob/master/doc/zstd_compression_format.md

The zstd format is tightly associated with the compression algorithm and is therefore more complex. It can however be a sequence of independently compressed content and could be used for the same purpose as xz.

https://www.nongnu.org/lzip/xz_inadequate.html

Xz format inadequate for long-term archiving

Although simple and close to what is needed, Xz is not an exact match: the index would need to be maintained.

dachary closed this task as Resolved.Feb 15 2021, 5:37 PM

dachary added a parent task: T3048: Using a custom Sorted String Table format.Feb 15 2021, 5:43 PM

dachary mentioned this in T3054: Scale out object storage design.Feb 21 2021, 12:22 PM

dachary mentioned this in T3107: Scale out object storage explorations.Mar 9 2021, 1:09 PM

This task has been migrated to GitLab.

Using xz-file-format for 1TB archiveClosed, MigratedEdits LockedActions

Description

Related ObjectsSearch...

Event Timeline

Using xz-file-format for 1TB archive
Closed, MigratedEdits Locked
Actions

Related Objects
Search...