Page MenuHomeSoftware Heritage

lister: Add checks on expected scheduler's output tasks
AbandonedPublic

Authored by ardumont on Nov 8 2019, 11:12 AM.

Details

Reviewers
douardda
Group Reviewers
Reviewers
Summary

The functionality need is to ensure we have the required task types within the
scheduler. It must be done early in order to prevent the lister to do
computations (and hitting external apis) and then fail to insert the computed
tasks if the scheduler is not correctly setup-ed.

Let's make it fail as soon as possible.

I'm not sure where to implement the check function call though.
Within the:

  • lister's constructor
  • lister's task (the one creating the lister)
  • somewhere else?
Test Plan

tox

Diff Detail

Repository
rDLS Listers
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 8895
Build 12986: tox-on-jenkinsJenkins
Build 12985: arc lint + arc unit

Event Timeline

vlorentz added inline comments.
swh/lister/debian/lister.py
57–58

The constructor is the the right place for documenting the "output"; it should be either in the class docstring's or in a method's.

It's also unclear what you call an output (is it a return value of one of the methods?)

swh/lister/debian/lister.py
57–58

The constructor is the the right place for documenting the "output"; it should be either in the class docstring's or in a method's.

(I guess you meant not the right place)
I started in the class's docstring so i can put it back there.

It's also unclear what you call an output (is it a return value of one of the methods?)

Well, it's the lister's goal. It output tasks for loaders.

But it's not clearly said anywhere.

It's not a return value per say, it's written to the scheduler's db.
We can "somehow" have a glimpse when we look at the def task_dict(self, origin_type, origin_url, **kwargs): method implementations.
But it's not that apparent.

About the when/where:

irc discussion

This points toward the constructor as a reasonable location.
P558: @olasd and me.

pros:

  • should be simple enough to adapt the current diff with this

cons:

  • kind of side-effect within the constructor (side-effect as in "discuss with the scheduler api")
  • will that fail early enough?

Oral discussion

This points toward orchestrating within the scheduler (improving the current registering/initialization mechanism).
(@douardda and me)

pros:

  • make it fail even earlier, prior to the lister task registration (since there is no point in registering a task which cannot work)
  • no "side-effect"

cons:

  • most probably more work
  • is it possible?

This needs more thinking.