Page MenuHomeSoftware Heritage

swh.indexer.cli.journal_client: fix config use
ClosedPublic

Authored by vsellier on Nov 26 2020, 12:22 PM.

Details

Summary

The configuration was not read at all from the config file.

Related to T2814

Test Plan

tox

Diff Detail

Repository
rDCIDX Metadata indexer
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D4599 (id=16337)

Rebasing onto 440961d06c...

Current branch diff-target is up to date.
Changes applied before test
commit 41ad150d2490962b5fde0b2473b243b069da7824
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Thu Nov 26 12:16:22 2020 +0100

    swh.indexer.cli.journal_client: fix config use
    
    The configuration was not read at all from the config file.
    
    Related to T2814

Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/121/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/121/console

The diff fixes the configuration issue but it seems there is another problem with the visits :

root@journal0:~# swh indexer --config-file=/etc/softwareheritage/indexer/journal_client.yml  journal-client 
Traceback (most recent call last):
  File "/usr/bin/swh", line 11, in <module>
    load_entry_point('swh.core==0.9.1', 'console_scripts', 'swh')()
  File "/usr/lib/python3/dist-packages/swh/core/cli/__init__.py", line 135, in main
    return swh(auto_envvar_prefix="SWH")
  File "/usr/lib/python3/dist-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3/dist-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/lib/python3/dist-packages/swh/indexer/cli.py", line 281, in journal_client
    client.process(worker_fn)
  File "/usr/lib/python3/dist-packages/swh/journal/client.py", line 265, in process
    batch_processed, at_eof = self.handle_messages(messages, worker_fn)
  File "/usr/lib/python3/dist-packages/swh/journal/client.py", line 292, in handle_messages
    worker_fn(dict(objects))
  File "/usr/lib/python3/dist-packages/swh/indexer/journal_client.py", line 18, in process_journal_objects
    process_origin_visits(messages["origin_visit"], scheduler, task_names)
  File "/usr/lib/python3/dist-packages/swh/indexer/journal_client.py", line 25, in process_origin_visits
    visits = [visit for visit in visits if visit["status"] == "full"]
  File "/usr/lib/python3/dist-packages/swh/indexer/journal_client.py", line 25, in <listcomp>
    visits = [visit for visit in visits if visit["status"] == "full"]
KeyError: 'status'

The diff fixes the configuration issue

quite

but it seems there is another problem with the visits :
KeyError: 'status'

yes, visit no longer references the status.
So either we drop the filtering from the visit.

Or we stop the subscription from visit and start subscribing on origin-visit-status instead (which sounds more reasonable to me).

@vlorentz ^ what do you think?

Jenkins > .tox.py3.lib.python3.7.site-packages.swh.indexer.tests.test_cli::test_journal_client

Yes, we need to make sure to allow the no configuration file test case when everything else is provided through the cli.

Or we stop the subscription from visit and start subscribing on origin-visit-status instead (which sounds more reasonable to me).

Definitely that. Not filtering on visit status will cause a lot of useless work downstream on the indexer workers.

Build is green

Patch application report for D4599 (id=16343)

Rebasing onto 440961d06c...

Current branch diff-target is up to date.
Changes applied before test
commit c307e162295ab4eda9b856244fda6abb0093fa31
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Thu Nov 26 12:16:22 2020 +0100

    swh.indexer.cli.journal_client: fix config use
    
    The configuration was not read at all from the config file.
    
    Related to T2814

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/122/ for more details.

Improve test coverage and change mandatory configuration validation

Build is green

Patch application report for D4599 (id=16345)

Rebasing onto 440961d06c...

Current branch diff-target is up to date.
Changes applied before test
commit d92c241980dbf1519db897ee6f4c5aaf7a321a62
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Thu Nov 26 12:16:22 2020 +0100

    swh.indexer.cli.journal_client: ensure the minimal configuration exists
    
    The minimum configuration is provided either by the --config-file
    or the --broker parameters
    
    Related to T2814

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/123/ for more details.

This revision is now accepted and ready to land.Nov 26 2020, 2:55 PM

Definitely that. Not filtering on visit status will cause a lot of useless work downstream on the indexer workers.

That's D4605 ;)