Page MenuHomeSoftware Heritage

cassandra: Add 'allow_overwrite' option, to allow updating objects
ClosedPublic

Authored by vlorentz on Apr 22 2021, 8:28 PM.

Details

Summary

as part of a migration.

Also write a first test that simulates how a simple migration would go.

This is the easy part of T2602.

Diff Detail

Unit TestsFailed

TimeTest
2 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.algos.test_snapshot::test_snapshot_get_latest
swh_storage = <swh.storage.in_memory.InMemoryStorage object at 0x7f38445dd828> sample_data = <swh.storage.tests.storage_data.StorageData object at 0x7f38440f2908>
2 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.algos.test_snapshot::test_snapshot_get_latest_none
swh_storage = <swh.storage.in_memory.InMemoryStorage object at 0x7f38444d96d8> sample_data = <swh.storage.tests.storage_data.StorageData object at 0x7f38441dca90>
2 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.algos.test_snapshot::test_snapshot_id_get_from_revision
swh_storage = <swh.storage.in_memory.InMemoryStorage object at 0x7f38444d72b0> sample_data = <swh.storage.tests.storage_data.StorageData object at 0x7f38444d7f28>
1,700 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.algos.test_snapshot::test_snapshot_large
swh_storage = <swh.storage.in_memory.InMemoryStorage object at 0x7f3868c7d358> @settings(suppress_health_check=function_scoped_fixture_check)
2 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.algos.test_snapshot::test_snapshot_resolve_alias
swh_storage = <swh.storage.in_memory.InMemoryStorage object at 0x7f3850b7b2b0> sample_data = <swh.storage.tests.storage_data.StorageData object at 0x7f3850b7bc18>
View Full Test Results (320 Failed · 493 Passed · 23 Skipped)

Event Timeline

Build has FAILED

Patch application report for D5582 (id=19943)

Rebasing onto eb8c147a54...

Current branch diff-target is up to date.
Changes applied before test
commit 5eddebc9accf868dc5974608755fc72f01dc9317
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Apr 22 20:27:33 2021 +0200

    cassandra: Add 'check_missing' option, to allow updating objects
    
    as part of a migration.
    
    Also write a first test that simulates how a simple migration would go.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1265/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1265/console

Harbormaster returned this revision to the author for changes because remote builds failed.Apr 22 2021, 8:33 PM
Harbormaster failed remote builds in B20948: Diff 19943!

Build has FAILED

Patch application report for D5582 (id=19944)

Rebasing onto eb8c147a54...

Current branch diff-target is up to date.
Changes applied before test
commit 9ed28808477549ce75ad9dd7214ef03c88c690e5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Apr 22 20:27:33 2021 +0200

    cassandra: Add 'check_missing' option, to allow updating objects
    
    as part of a migration.
    
    Also write a first test that simulates how a simple migration would go.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1266/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1266/console

Harbormaster returned this revision to the author for changes because remote builds failed.Apr 22 2021, 8:40 PM
Harbormaster failed remote builds in B20949: Diff 19944!

Build is green

Patch application report for D5582 (id=19951)

Rebasing onto eb8c147a54...

Current branch diff-target is up to date.
Changes applied before test
commit 879b240c123dc47afffc6bcb3840be32d6bf1eab
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Apr 22 20:27:33 2021 +0200

    cassandra: Add 'check_missing' option, to allow updating objects
    
    as part of a migration.
    
    Also write a first test that simulates how a simple migration would go.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1267/ for more details.

douardda added inline comments.
swh/storage/cassandra/storage.py
189–210

why is the whole collision detection block is now run only if check_missing is True?

swh/storage/cassandra/storage.py
189–210

Because, if an object already exists with the exact same hashes and it was not filtered out initially, then self._content_get_from_hash(algo, content.get_hash(algo)) will return it and it would be detected as a collision.

I could filter the results of self._content_get_from_hash, but I don't think it's worth it, as _check_missing is pretty niche anyway.

swh/storage/cassandra/storage.py
189–210

And collision detection in Cassandra isn't guaranteed anyway, because two colliding objects may be inserted at the same time and Cassandra is only eventually consistent.

I'm not fond of the 'check_missing' name for the argument, but would prefer 'allow_overwrite'.

Then, is the added test is nice, most of the _add methods are not tested when the config option is false. I'd like to see (stupid) simple tests for those.

This revision now requires changes to proceed.Apr 29 2021, 9:53 AM

rename 'check_missing' to 'not allow_overwrite'

Build has FAILED

Patch application report for D5582 (id=20126)

Rebasing onto 98804f9e12...

First, rewinding head to replay your work on top of it...
Applying: cassandra: Add 'check_missing' option, to allow updating objects
Changes applied before test
commit bc0cd4eb0e765002790266b97f25e58b3946eb18
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Apr 22 20:27:33 2021 +0200

    cassandra: Add 'check_missing' option, to allow updating objects
    
    as part of a migration.
    
    Also write a first test that simulates how a simple migration would go.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1279/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1279/console

Build has FAILED

Patch application report for D5582 (id=20129)

Rebasing onto 98804f9e12...

First, rewinding head to replay your work on top of it...
Applying: cassandra: Add 'check_missing' option, to allow updating objects
Changes applied before test
commit d3ea8339115170edbdebe48c6586a6380d8725f3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Apr 22 20:27:33 2021 +0200

    cassandra: Add 'check_missing' option, to allow updating objects
    
    as part of a migration.
    
    Also write a first test that simulates how a simple migration would go.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1281/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1281/console

Build has FAILED

Patch application report for D5582 (id=20138)

Rebasing onto 98804f9e12...

First, rewinding head to replay your work on top of it...
Applying: cassandra: Add 'check_missing' option, to allow updating objects
Changes applied before test
commit 185084abe76d6847189fc45b7ddd4529a1dc815d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Apr 22 20:27:33 2021 +0200

    cassandra: Add 'check_missing' option, to allow updating objects
    
    as part of a migration.
    
    Also write a first test that simulates how a simple migration would go.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1283/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1283/console

Build has FAILED

Patch application report for D5582 (id=20142)

Rebasing onto 98804f9e12...

First, rewinding head to replay your work on top of it...
Applying: cassandra: Add 'check_missing' option, to allow updating objects
Changes applied before test
commit da0768cbff30e00f23d07d56d0461b9997ec5927
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Apr 22 20:27:33 2021 +0200

    cassandra: Add 'check_missing' option, to allow updating objects
    
    as part of a migration.
    
    Also write a first test that simulates how a simple migration would go.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1286/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1286/console

Build is green

Patch application report for D5582 (id=20144)

Rebasing onto 98804f9e12...

First, rewinding head to replay your work on top of it...
Applying: cassandra: Add 'check_missing' option, to allow updating objects
Changes applied before test
commit c5354b94fd069306df5daf6d09cc85ac82d415a6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Apr 22 20:27:33 2021 +0200

    cassandra: Add 'check_missing' option, to allow updating objects
    
    as part of a migration.
    
    Also write a first test that simulates how a simple migration would go.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1287/ for more details.

vlorentz retitled this revision from cassandra: Add 'check_missing' option, to allow updating objects to cassandra: Add 'allow_overwrite' option, to allow updating objects.Apr 30 2021, 2:25 PM
This revision is now accepted and ready to land.May 3 2021, 2:14 PM

Build is green

Patch application report for D5582 (id=20230)

Rebasing onto 92d551a4a5...

Current branch diff-target is up to date.
Changes applied before test
commit f23346144dd3032df0664b3a49b8fe254aed217f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Apr 22 20:27:33 2021 +0200

    cassandra: Add 'check_missing' option, to allow updating objects
    
    as part of a migration.
    
    Also write a first test that simulates how a simple migration would go.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1297/ for more details.