This diff is mainly here for discussion on how to implement such a thing
and what kind of 'plugin system' could be provided, especially for features
like lister, loaders and more generally scheduler-managed workers.
Depends on D1503.
vlorentz | |
ardumont |
Reviewers |
This diff is mainly here for discussion on how to implement such a thing
and what kind of 'plugin system' could be provided, especially for features
like lister, loaders and more generally scheduler-managed workers.
Depends on D1503.
Automatic diff as part of commit; lint not applicable. |
Automatic diff as part of commit; unit tests not applicable. |
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tox/136/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tox/136/console
Would it be possible to deduplicate the register code by putting all these functions in a single file directly in swh/lister/?
setup.py | ||
---|---|---|
58–65 | We should build that list dynamically instead of hardcoding it. This will ease the adding of new listers. Iterating on the sub-directories of the swh/lister folder could do the trick (core and tests must be Other solution, import every submodules from swh.lister and check the presence of the register | |
swh/lister/npm/__init__.py | ||
12 | For npm, there is two models to initialize: swh.lister.npm.models.NpmModel and swh.lister.npm.models.NpmVisitModel |
swh/lister/cli.py | ||
---|---|---|
83–86 | this looks a lot like the code to generate SUPPORTED_LISTERS. |
setup.py | ||
---|---|---|
58–65 | No you cannot, that's the whole point of this idea: being able to declare plugins without having to load every possible python package or (recursively?) look for them in "some well known places". I did not want to force listers to be 'installed' in a 'swh.lister' namespace (in the sense of PEP420 ). Using this method based on entry points, a lister can be anywhere and does not need to lies within our swh namespace, and it is effectively loaded only if needed. I did implement the main swh.lister as plugins here mainly to show how it can be done. These default/basic listers could come with the main swh.lister package (preloaded) without using the plugin mechanism. This is debatable. TBH, I'm far from convinced this 'register' function is fine as it is in this diff (neither the data structure returned by the function nor the function name, however this later can be anything, since it's fully in the entrypoint declaration). | |
swh/lister/cli.py | ||
83–86 | I know... Not sure yet if it is a good idea to avoid it. | |
swh/lister/npm/__init__.py | ||
12 | that's typically why I'm not convinced by the 'API' of the plugin loading mechanism here. The true initialization work is in fact done in the 'init' hook, which is a simple function and thus can initialize as many databases/tables as one wants. |
swh/lister/cli.py | ||
---|---|---|
83–86 | you could do: LISTERS = {entry_point.name.split('.', 1)[1]: entry_point for entry_point in pkg_resources.iter_entry_points('swh.workers') if entry_point.name.split('.', 1)[0] == 'lister'} SUPPORTED_LISTERS = list(LISTERS) |
rebase + changes in the plugin "API"
now the register function is expected to return a dict with:
If 'init' is not present, default implementation creates tables corresponding
to table/models declared in 'models'.
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tox/308/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tox/308/console
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tox/320/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tox/320/console
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tox/333/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tox/333/console
This sounds good.
It untangles some part like the npm initialization (multiple tables to init). And makes it more declarative, /me likes.
The ci is unhappy though.
I'm not sure why... oh might be because of the needed changes in the scheduler part... (the diff this diff depends on or something).
rebase + better ci msg + improve config handling in cli.py
+ add the --config-file option at the swh lister (group) level (also move
the --db-url there).
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tox/336/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tox/336/console
Build is green
See https://jenkins.softwareheritage.org/job/DLS/job/tox/337/ for more details.