Page MenuHomeSoftware Heritage

Start adding a Table abstraction.
AbandonedPublic

Authored by vlorentz on Nov 19 2018, 1:33 PM.

Details

Reviewers
douardda
Group Reviewers
Reviewers
Summary

This Diff introduces a new Table class, that acts as an abstraction over Python data structures, providing only features a regular database would.
For instance, it prevents stuff like modifying the database's content by manipulating an object that's already in it.

This is what I mentioned in last week's diff's comments (https://forge.softwareheritage.org/D645#inline-3404).

Note that this Diff only uses the Table for simple "tables". Snapshots, origins, and origin visits are more tricky (origin keys are two columns) and I did not completely figure out how to deal with them in a clean way.

Diff Detail

Repository
rDSTO Storage manager
Branch
in-mem-tables
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 2542
Build 3157: tox-on-jenkinsJenkins
Build 3156: arc lint + arc unit

Event Timeline

  • Remove the freezing stuff, just don't allow lists and dicts.
  • Use 'attrs' instead of namedtuples.
  • Remove the doc update, that belongs in another diff.
douardda added a subscriber: douardda.

Before going further in this direction, I'd like we have a discussion so:

  • we agree to introduce the usage of a lib like attr,
  • and agree to use it everywhere we need somehow to describe models in the code.

I do not want to have 3 or 4 different ways of defining such models in our code base.

Also:

  • adding 200+ loc in an already rather big python module should be avaided: create a separated module with this model logics in, we could even argue that the Table class should live in its own module,
  • not having a single test neither for this new data model nor the Table class is not reasonable.
This revision now requires changes to proceed.Dec 3 2018, 11:43 AM

Before going further in this direction, I'd like we have a discussion so:

  • we agree to introduce the usage of a lib like attr,
  • and agree to use it everywhere we need somehow to describe models in the code.

I do not want to have 3 or 4 different ways of defining such models in our code base.

Agreed

  • adding 200+ loc in an already rather big python module should be avaided: create a separated module with this model logics in, we could even argue that the Table class should live in its own module,

I propose to have it here only temporarily, and move it to swh.model once it's stabilized.

  • not having a single test neither for this new data model nor the Table class is not reasonable.

Existing tests already test it's working on valid data. I'll add tests for invalid data.