Page MenuHomeSoftware Heritage

Use swh.scheduler instead of celery in the orchestrator.
ClosedPublic

Authored by vlorentz on Oct 25 2018, 7:09 PM.

Details

Diff Detail

Repository
rDCIDX Metadata indexer
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

I did not check the rest yet...
Will do tomorrow.

swh/indexer/orchestrator.py
62

Use instead a configuration like so [1]
It's yaml but it should be a python dict ;)

scheduler: 
  cls: remote
  args:  
    url: http://localhost:5008 [2]

[1] https://forge.softwareheritage.org/source/puppet-swh-site/browse/production/data/defaults.yaml$1285-1287

[2] default port for scheduler instance is https://forge.softwareheritage.org/source/puppet-swh-site/browse/production/data/defaults.yaml$1136

84

That way, that becomes simpler here:

self.scheduler = get_scheduler(**self.config['scheduler'])

Also it matches consistently what we do elsewhere.

  • Better default config for the scheduler.
  • Rebase
  • Fix rebase.
vlorentz marked an inline comment as done.
  • Prettier prepare_scheduler.
  • No need to make TestOrchestrator inherit from CeleryTestFixture.
swh/indexer/orchestrator.py
65

Default should be for a developer machine so it should work out of the box.
~> Use the port 5008

It's also consistent with other modules.
(and if the other modules don't do that, it's wrong ;)

  • Use swh.scheduler.utils.create_task_dict.
This revision is now accepted and ready to land.Oct 26 2018, 6:49 PM
swh/indexer/orchestrator.py
70

We should make that 'indexer_mimetype' now, 'indexer_language', etc...
That'd be the task_type.

swh/indexer/orchestrator.py
70

'indexer_*' to convey that's indexer related information.

Update on task_type where we need to are at: