Investigate how to upgrade the schema of the Cassandra storage
Closed, MigratedEdits Locked
Actions

Assigned To

Authored By

	vlorentz
	Sep 15 2020, 1:56 PM

Description

With Cassandra, we cannot just "ALTER TABLE" to make arbitrary changes like in postgresql, so we need a solution for schema upgrades if we want it in production.

Revisions and Commits

rDSTO Storage manager
	D5584	rDSTO051b77152356 cassandra: Add a test of a 'complex' migration, with a PK update
	D5582	rDSTOf23346144dd3 cassandra: Add 'check_missing' option, to allow updating objects

Related Objects
Search...

Status	Assigned	Task
Migrated	gitlab-migration	T2213 Storage
Migrated	gitlab-migration	T2214 Scale-out graph and database storage in production
Migrated	gitlab-migration	T1892 Cassandra as a storage backend
Migrated	gitlab-migration	T2602 Investigate how to upgrade the schema of the Cassandra storage

Event Timeline

vlorentz triaged this task as Normal priority.Sep 15 2020, 1:56 PM

vlorentz created this task.

vlorentz mentioned this in D4255: Add validation step after metadata ingestion of a metadata-only deposit.Oct 15 2020, 10:20 AM

vlorentz removed a project: Roadmap 2020.Feb 10 2021, 3:51 PM

vlorentz added a parent task: T1892: Cassandra as a storage backend.Apr 16 2021, 1:36 PM

What we can do, however:

remove non-PK columns
add non-PK column, with all values set to null
add/remove tables

this, possibly combined with a short python script to fill in data, should be enough for simple migrations, including adding/removing index tables.

For the harder cases, that involve changes to the PK, we could do something like this:

create a new table with a new name (eg. revision_v[n+1]; like we do in swh-search except Cassandra does not support aliases)
start an extra storage backend, that reads from that table instead of the old one (eg. revision_v[n]), and also reads from all the other tables as usual
have a multiplexing storage proxy (like we have for the objstorage), that queries this new backend (which reads from v[n+1]), and falls back to the old backend (which reads from v[n])
run a script that reads from v[n], adds to v[n+1], then optionally removes from v[n] (I'm leaning toward not removing, for better perfs and to allow rollbacks)
when the script is done, drop v[n] and the old backend

the issue with this is interaction between multiple versions of the code. We need to either keep an old version of swh-storage deployed, or have code in swh-storage to deal with either versions; for as long as data from v[n] isn't completely migrated.

This can add accumulate if we want to push a new update/migration while the previous one isn't completely migrated.

Thoughts?

In T2602#63432, @vlorentz wrote:

For the harder cases, that involve changes to the PK, we could do something like this:

create a new table with a new name (eg. revision_v[n+1]; like we do in swh-search except Cassandra does not support aliases)

start an extra storage backend, that reads from that table instead of the old one (eg. revision_v[n]), and also reads from all the other tables as usual

have a multiplexing storage proxy (like we have for the objstorage), that queries this new backend (which reads from v[n+1]), and falls back to the old backend (which reads from v[n])

And, I guess, somehow pushes writes to both backends?

run a script that reads from v[n], adds to v[n+1], then optionally removes from v[n] (I'm leaning toward not removing, for better perfs and to allow rollbacks)

(because if you want to support rollbacks, you need new writes to happen in both tables)

when the script is done, drop v[n] and the old backend

the issue with this is interaction between multiple versions of the code. We need to either keep an old version of swh-storage deployed, or have code in swh-storage to deal with either versions; for as long as data from v[n] isn't completely migrated.

I think we will need the multiplexed storage proxy to select which writes go to which backend, but in general the approach sounds okay.

This can add accumulate if we want to push a new update/migration while the previous one isn't completely migrated.

The biggest migration of this kind we'll have to handle is swhids v2, and I think this will be a long running multi-step process (because we'll have to compute new ids for all objects, in layers); I don't see us performing many such large migrations in parallel any time soon.

Thoughts?

I think having multiple concurrent versions of the same module running in parallel (rather than having layers and layers of compatibility code) is the most compelling option.

Unfortunately, that's really not something that our current deployment scenarios support well:

all packages are installed system-wide in a shared site-packages directory
all packages are installed with debian packages, which don't support the installation of multiple versions concurrently
our debian archive only supports one version per package anyway

But I think we really need to move towards being able to better control deployment versions, and sooner rather than later. This probably means generally migrating to virtualenv-based deployments, for which I see a ton of upsides (most notably, if we do it properly, we'll be getting better reproducibility of environments between tests, staging and production), and a few downsides (managing the up-to-dateness of dependencies in a set of virtualenvs is still an open problem, AFAICT).

When we have that, it should be easy enough to have a venv with the legacy code running as a fallback on which the multiplexer can direct requests if needed.

Doesn't this deserve a state-of-the-art kind of thing? Are there documentation material on the subject? How does other (big) cassandra users handle this?

Maybe ask questions on one or 2 mailing lists also?

Doesn't this deserve a state-of-the-art kind of thing?

As far as I can tell, the state-of-the-art is "having a bunch of .cql files (or hand-written Java classes), and a script to remember which one were already ran" (like what we do in postgresql with dbversion); all the existing tools I found follow this pattern.

Are there documentation material on the subject? How does other (big) cassandra users handle this?

All the discussions/documentation I could find on the subject either blissfully ignores that CQL isn't omnipotent, or assumes you get the primary key right on the first try, or expects you to write your own custom code.

And it's not exactly surprising, given the restrictions Cassandra has.

Maybe ask questions on one or 2 mailing lists also?

Someone did, here: https://community.datastax.com/questions/4201/mange-schema-change.html

I just discussed the multiplexer-based migration process I described above with ardumont/olasd/vsellier.

It's actually quite tricky to preserve consistency, because it breaks the transactionally of writes:

if the multiplexer writes only to the "new" backend, then no rollback is possible (as mentioned by olasd)

if the multiplexer writes to both backends, but can read from both, then an interrupted write will still show a hit when reading, so (depending on write order) either of the backends will silently be missing some objects

if the multiplexer writes to both backends serially, but reads only from one of them (the last it writes to), then we will be missing objects after either the switchover or the rollback

Therefore, we discussed an other approach:

deploy a new backend, loaders keep writing only to the old one

setup a journal replayer, that fills the new backend from kafka

wait for the replayer to be close to the end of the log (may take from a few days to maybe a month, let's wait for benchmarks)

when the replayer is close to the end, stop loaders, and wait for the replayer to reach the end

stop the replayer

switchover to the new backend

This has the extra advantage of exercising the replayer on *all* objects before committing to the new schema (so we can catch bugs and inefficiencies)

vlorentz mentioned this in D5574: cassandra: Deduplicate table names.Apr 22 2021, 12:21 PM

vlorentz mentioned this in D5582: cassandra: Add 'allow_overwrite' option, to allow updating objects.Apr 22 2021, 8:28 PM

vlorentz added revisions: D5584: cassandra: Add a test of a 'complex' migration, with a PK update, D5582: cassandra: Add 'allow_overwrite' option, to allow updating objects.Apr 26 2021, 10:25 AM

vlorentz added a commit: rDSTOf23346144dd3: cassandra: Add 'check_missing' option, to allow updating objects.May 4 2021, 10:59 AM

vlorentz added a commit: rDSTO051b77152356: cassandra: Add a test of a 'complex' migration, with a PK update.

vlorentz closed this task as Resolved.May 26 2021, 11:25 AM

This task has been migrated to GitLab.

Investigate how to upgrade the schema of the Cassandra storageClosed, MigratedEdits LockedActions

Description

Revisions and Commits

Related ObjectsSearch...

Event Timeline

Investigate how to upgrade the schema of the Cassandra storage
Closed, MigratedEdits Locked
Actions

Related Objects
Search...