Cassandra storage: Reduce the size of the "secondary lookup tables" for contents
Closed, MigratedEdits Locked
Actions

Assigned To

Authored By

	olasd
	Mar 6 2020, 5:19 PM

Description

Currently, all the per-hash lookup tables for contents store the full quadruplet of hashes.

Lookups of contents by a given hash need to fetch one row in the lookup table, then fetch the matching row from the main content metadata table

Instead of storing all hashes in the "index" tables, we could store the token of the row in the main table only, then re-check that the hash is indeed matching when retreiving it.

This would save a substantial amount of storage space, and the performance should be fairly similar:

Before:

(client) Lookup hash quadruplet in lookup table
(client) Lookup full metadata by quadruplet
- (cassandra driver) Compute partition token for returned hash quadruplet
- (cassandra driver) query the server for the data
- (cassandra driver) recheck that the quadruplet matches
- (cassandra driver) return full metadata

After:

(client) Lookup partition token in lookup table
(client) Lookup rows by partition token
- (cassandra driver) query server for the data
(client) Filter for matching quadruplets
(client) return metadata

Revisions and Commits

rDSTO Storage manager
	Closed	D2796 Store the value of token(partition_key) in content_by_* table, instead of three hashes.

Related Objects
Search...

		Status	Assigned	Task
		Migrated	gitlab-migration	T2304 Cassandra storage: Reduce the size of the "secondary lookup tables" for contents
		Migrated	gitlab-migration	T2498 Re-create the Cassandra cluster using on-premise servers