Page MenuHomeSoftware Heritage

sql: Update origin trigger to provide full origin information
ClosedPublic

Authored by ardumont on Nov 21 2018, 5:28 PM.

Details

Summary

This updates the origin creation trigger to provide the full object
information instead of its identifier.

In effect, the object provided in the kafka topic does not need to be
reified. This means another diff is coming on the journal part to deal
with this.

Diff Detail

Repository
rDSTO Storage manager
Branch
update-origin-trigger
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 2513
Build 3109: tox-on-jenkinsJenkins
Build 3108: arc lint + arc unit

Event Timeline

ardumont created this revision.Nov 21 2018, 5:28 PM
douardda accepted this revision.Nov 22 2018, 9:42 AM
douardda added a subscriber: douardda.

LGTM however I'm not sure I get the point of this diff. I need to have a look at D693 I guess

This revision is now accepted and ready to land.Nov 22 2018, 9:42 AM
ardumont marked an inline comment as done.EditedNov 22 2018, 10:20 AM

however I'm not sure I get the point of this diff. I need to have a look at D693 I guess

It's because the main part working on the trigger is not apparent in the diff.
The storage listener [1] is in charge of reacting to those trigger events (in that diff, the origin one).
It pushes in the kafka topic those objects (here the plain origin and not its identifier as per other objects).

In the other diff, following the same logic for all objects, you'll see the actual journal publisher [2].
It's in charge of:

  • subscribing to those topics
  • reifying the objects that are not complete from the storage
  • pushing to full object topics (for the current and future journal clients)

In the case of the origin (that diff), there is no reifying as the object is full already, so it's just a passthrough.

For the choice, that would interest you [3].

As far as i remember, the intermediary topics are here because we could have holes in the trigger events.
So we need another module in charge of filling the gaps (swh.journal.checker)

[1] https://forge.softwareheritage.org/source/swh-storage/browse/master/swh/storage/listener.py$48

[2] https://forge.softwareheritage.org/source/swh-journal/browse/master/swh/journal/publisher.py

[3] https://forge.softwareheritage.org/T424#8781

sql/upgrades/129.sql
16

Wondering if we need to keep the id. I think not.

vlorentz added inline comments.
sql/upgrades/129.sql
16

Let's remove it. It won't (or shouldn't be) part of the output of pgsql.

ardumont updated this revision to Diff 2181.Nov 22 2018, 11:31 AM

Remove origin id from the trigger

ardumont marked an inline comment as done.Nov 22 2018, 11:31 AM
olasd added a comment.Nov 22 2018, 2:18 PM

As far as i remember, the intermediary topics are here because we could have holes in the trigger events.

The intermediary topics are here because pg_notify is blocking and limited to text, which all prevents us from doing the expensive operations needed to properly send full objects with it.

So we need another module in charge of filling the gaps (swh.journal.checker)

This is just a nice side effect : it makes the checker just a diff of two lists of ids.

This revision was automatically updated to reflect the committed changes.