Page MenuHomeSoftware Heritage

storage: open content_update endpoint to permit update on content rows
ClosedPublic

Authored by ardumont on Mar 2 2017, 10:59 AM.

Details

Summary

Permits to batch update content values (with or without optional new columns).

Limited to contents in the table content. The contents in
skipped_content are not dealt with as we cannot recompute hashes on
those since we don't have them in the objstorage.

Related T692

Note:

  • There are limits to the current implementation as per the todo description in the code (swh.storage.content_update).
  • I have a problem when running the tests for now. Individually, as a set of 2 tests, the new tests run fine. When running all the tests, the one with the new columns hang and i don't know why yet (working on it). This does not block the review though.

Diff Detail

Repository
rDSTO Storage manager
Branch
add-archiver-content-updater+refactoring
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 715
Build 961: Software Heritage Python tests
Build 960: arc lint + arc unit

Event Timeline

I didn't dive deep into the docstring, but at least I gave a stab at the SQL query.

sql/upgrades/101.sql
16

I don't see the point of the two arguments: columns_update has to be a subset of columns_select, and the extra columns in the select argument aren't being used?

31–37

Please use either array_to_string or string concatenation consistently rather than one and the other.

39–45

The function is called update, the main query should be an update.

40

Is this distinct really needed?

swh/storage/storage.py
143

This is inconsistent with the actual function being called

sql/upgrades/101.sql
16

I don't see the point of the two arguments: columns_update has to be a subset of columns_select

I wanted to untangle the complexity in computation of what columns to select and what columns to update.
Debugging those stored procedure is a pain.
Plus, i kept having the wrong result when trying otherwise...

and the extra columns in the select argument aren't being used?

Yes, they are not.
Only the columns_update are to be updated.
The others are effectively dropped.

According to my understanding of the need in T692 (which is the driving need for that api for now) and the 'upsert' documentation.

31–37

Sure. Thanks.

39–45

Right.

40

mmm, i guess not. I'll check for this.

swh/storage/storage.py
143

I'm not sure i get it.

Do you mean it would be inconsistent to call content_update on inexisting content? I would agree with that.

Or maybe,it's the 'for now' which should be dropped altogether?

I didn't dive deep into the docstring, but at least I gave a stab at the SQL query.

Well, thanks already :D

  • storage.content_update: Simplify sql update implementation
  • storage.content_update: Simplify sql update implementation
  • storage.content_update: Move altering schema tests in their own class

fixup some bad sql comments commit

  • storage.content_update: Simplify sql update implementation
  • storage.content_update: Move altering schema tests in their own class
swh/storage/storage.py
143

I think the inconsistence you meant was that prior the latest development, the initial sql update function used a copy schema to 'update'. Which indeed could have triggered some new rows if fed with new contents.
I was aware of it, thus the comment and the todo below.

Now, the implementation to update is really an update.
So this should no longer be necessary.
I'll remove this and the todo below.

Rebased to master

  • storage: open content_update endpoint
  • storage.content_update: Provide the select and update columns as parameters
  • storage.content_update: Simplify sql update implementation
  • storage.content_update: Simplify sql update implementation
  • storage.content_update: Move altering schema tests in their own class
  • doc: Update docstring on content_update

There are limits to the current implementation as per the todo description in the code (swh.storage.content_update).

Done.

I have a problem when running the tests for now. Individually, as a set of 2 tests, the new tests run fine. When running all the tests, the one with the new columns hang and i don't know why yet (working on it). This does not block the review though.

Fixed.

This revision was automatically updated to reflect the committed changes.