Page MenuHomeSoftware Heritage

cassandra.cql: Use static dataclasses instead of generating namedtuples on the fly.
ClosedPublic

Authored by vlorentz on Aug 10 2020, 9:35 PM.

Details

Summary

Before this commit, python-cassandra used the default row factory,
which creates anonymous named tuple on each query, which makes it
impossible to type CqlRunner properly.

This commit replaces the row factory with dict_factory, which creates
only dicts, and converts them to well-defined dataclasses.
Additionally, this stop leaking python-cassandra internals to
cassandra.storage.

This also has some great side-effects:

  • methods of CqlRunner are now consistent with each other (eg. _add_one methods used to be a mix of objects, dictionaries, and taking each value as argument)
  • it will allow me to deduplicate more codes in further commits (I already deduplicated insertions methods to use self._add_one, as it was meant on the initial write of this class)
  • CqlRunner no longer needs to define lists with column names, they are automatically detected from the dataclasses

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D3756 (id=13220)

Rebasing onto 7d332f5967...

Current branch diff-target is up to date.
Changes applied before test
commit 3d3ac1605129097740c24ccc48a8519a1b9b78a3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 10 21:33:00 2020 +0200

    cassandra.cql: Use static dataclasses instead of generating namedtuples on the fly.
    
    Before this commit, python-cassandra used the default row factory,
    which creates anonymous named tuple on each query, which makes it
    impossible to type CqlRunner properly.
    
    This commit replaces the row factory with dict_factory, which creates
    only dicts, and converts them to well-defined dataclasses.
    Additionally, this stop leaking python-cassandra internals to
    cassandra.storage.
    
    This also has some great side-effects:
    
    * methods of CqlRunner are now consistent with each other (eg. _add_one
      methods used to be a mix of objects, dictionaries, and taking each value
      as argument)
    * it will allow me to deduplicate more codes in further commits (I
      already deduplicated insertions methods to use self._add_one, as
      it was meant on the initial write of this class)
    * CqlRunner no longer needs to define lists with column names, they are
      automatically detected from the dataclasses

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/720/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/720/console

fix test (type change committed too early)

Build is green

Patch application report for D3756 (id=13224)

Rebasing onto 7d332f5967...

Current branch diff-target is up to date.
Changes applied before test
commit 319de05d5fbebbebb47532209490a2f8380f5343
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 10 21:33:00 2020 +0200

    cassandra.cql: Use static dataclasses instead of generating namedtuples on the fly.
    
    Before this commit, python-cassandra used the default row factory,
    which creates anonymous named tuple on each query, which makes it
    impossible to type CqlRunner properly.
    
    This commit replaces the row factory with dict_factory, which creates
    only dicts, and converts them to well-defined dataclasses.
    Additionally, this stop leaking python-cassandra internals to
    cassandra.storage.
    
    This also has some great side-effects:
    
    * methods of CqlRunner are now consistent with each other (eg. _add_one
      methods used to be a mix of objects, dictionaries, and taking each value
      as argument)
    * it will allow me to deduplicate more codes in further commits (I
      already deduplicated insertions methods to use self._add_one, as
      it was meant on the initial write of this class)
    * CqlRunner no longer needs to define lists with column names, they are
      automatically detected from the dataclasses

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/724/ for more details.

anlambert added a subscriber: anlambert.

Diff was huge, took me some time to read it all. Anyway, Looks good to me, nice code rework.

This revision is now accepted and ready to land.Aug 11 2020, 1:21 PM

cool stuff ;)

swh/storage/cassandra/model.py
19

What's UDT?

swh/storage/cassandra/model.py
19

User-Defined Type