Page MenuHomeSoftware Heritage

implement listers as plugins
Needs RevisionPublic

Authored by douardda on May 22 2019, 2:49 PM.

Details

Reviewers
vlorentz
Group Reviewers
Reviewers
Summary

This diff is mainly here for discussion on how to implement such a thing
and what kind of 'plugin system' could be provided, especially for features
like lister, loaders and more generally scheduler-managed workers.

Depends on D1503.

Diff Detail

Repository
rDLS Listers
Branch
master
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 5885
Build 8064: tox-on-jenkinsJenkins
Build 8063: arc lint + arc unit

Event Timeline

douardda created this revision.May 22 2019, 2:49 PM
vlorentz added a subscriber: vlorentz.EditedMay 22 2019, 2:55 PM

Would it be possible to deduplicate the register code by putting all these functions in a single file directly in swh/lister/?

And you can replace the model import by Lister.MODEL, that's one less import to do

anlambert added inline comments.
setup.py
59–66

We should build that list dynamically instead of hardcoding it. This will ease the adding of new listers.

Iterating on the sub-directories of the swh/lister folder could do the trick (core and tests must be
excluded though).

Other solution, import every submodules from swh.lister and check the presence of the register
function to determine if it is a plugin or not.

swh/lister/npm/__init__.py
11

For npm, there is two models to initialize: swh.lister.npm.models.NpmModel and swh.lister.npm.models.NpmVisitModel

vlorentz added inline comments.May 27 2019, 2:23 PM
swh/lister/cli.py
64–67

this looks a lot like the code to generate SUPPORTED_LISTERS.

douardda added inline comments.Jun 4 2019, 5:26 PM
setup.py
59–66

No you cannot, that's the whole point of this idea: being able to declare plugins without having to load every possible python package or (recursively?) look for them in "some well known places".

I did not want to force listers to be 'installed' in a 'swh.lister' namespace (in the sense of PEP420 ).

Using this method based on entry points, a lister can be anywhere and does not need to lies within our swh namespace, and it is effectively loaded only if needed.

I did implement the main swh.lister as plugins here mainly to show how it can be done. These default/basic listers could come with the main swh.lister package (preloaded) without using the plugin mechanism. This is debatable.

TBH, I'm far from convinced this 'register' function is fine as it is in this diff (neither the data structure returned by the function nor the function name, however this later can be anything, since it's fully in the entrypoint declaration).

swh/lister/cli.py
64–67

I know... Not sure yet if it is a good idea to avoid it.

swh/lister/npm/__init__.py
11

that's typically why I'm not convinced by the 'API' of the plugin loading mechanism here. The true initialization work is in fact done in the 'init' hook, which is a simple function and thus can initialize as many databases/tables as one wants.

vlorentz added inline comments.Wed, Jun 26, 12:01 PM
swh/lister/cli.py
64–67

you could do:

LISTERS = {entry_point.name.split('.', 1)[1]: entry_point
               for entry_point
               in pkg_resources.iter_entry_points('swh.workers')
               if entry_point.name.split('.', 1)[0] == 'lister'}
SUPPORTED_LISTERS = list(LISTERS)

@douardda ping about this.

Also we added some new listers, so that need some rebase ;)

swh/lister/cli.py
64–67

yeah, i like the @vlorentz's improvment proposal

vlorentz requested changes to this revision.Thu, Jul 18, 11:18 AM
This revision now requires changes to proceed.Thu, Jul 18, 11:18 AM