Page MenuHomeSoftware Heritage

docker: Service Startup Error
Closed, MigratedEdits Locked

Description

I followed this document (https://docs.softwareheritage.org/devel/getting-started.html) to run Software Heritage platform,some services are not started. and some error in the log like :

swh-vault_1                     | Traceback (most recent call last):
swh-vault_1                     |   File "/srv/softwareheritage/venv/bin/swh", line 8, in <module>
swh-vault_1                     |     sys.exit(main())
swh-vault_1                     |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/cli/__init__.py", line 135, in main
swh-vault_1                     |     return swh(auto_envvar_prefix="SWH")
swh-vault_1                     |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/click/core.py", line 829, in __call__
swh-vault_1                     |     return self.main(*args, **kwargs)
swh-vault_1                     |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/click/core.py", line 782, in main
swh-vault_1                     |     rv = self.invoke(ctx)
swh-vault_1                     |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
swh-vault_1                     |     return _process_result(sub_ctx.command.invoke(sub_ctx))
swh-vault_1                     |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
swh-vault_1                     |     return _process_result(sub_ctx.command.invoke(sub_ctx))
swh-vault_1                     |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
swh-vault_1                     |     return ctx.invoke(self.callback, **ctx.params)
swh-vault_1                     |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/click/core.py", line 610, in invoke
swh-vault_1                     |     return callback(*args, **kwargs)
swh-vault_1                     |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/click/decorators.py", line 21, in new_func
swh-vault_1                     |     return f(get_current_context(), *args, **kwargs)
swh-vault_1                     |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/vault/cli.py", line 72, in serve
swh-vault_1                     |     app = make_app_from_configfile(config_file, debug=debug)
swh-vault_1                     |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/vault/api/server.py", line 228, in make_app_from_configfile
swh-vault_1                     |     vault = get_local_backend(cfg)
swh-vault_1                     |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/vault/api/server.py", line 205, in get_local_backend
swh-vault_1                     |     args = vcfg["args"]
swh-vault_1                     | KeyError: 'args'

swh-indexer-journal-client_1    | Traceback (most recent call last):
swh-indexer-journal-client_1    |   File "/srv/softwareheritage/venv/bin/swh", line 8, in <module>
swh-indexer-journal-client_1    |     sys.exit(main())
swh-indexer-journal-client_1    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/cli/__init__.py", line 135, in main
swh-indexer-journal-client_1    |     return swh(auto_envvar_prefix="SWH")
swh-indexer-journal-client_1    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/click/core.py", line 829, in __call__
swh-indexer-journal-client_1    |     return self.main(*args, **kwargs)
swh-indexer-journal-client_1    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/click/core.py", line 782, in main
swh-indexer-journal-client_1    |     rv = self.invoke(ctx)
swh-indexer-journal-client_1    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
swh-indexer-journal-client_1    |     return _process_result(sub_ctx.command.invoke(sub_ctx))
swh-indexer-journal-client_1    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
swh-indexer-journal-client_1    |     return _process_result(sub_ctx.command.invoke(sub_ctx))
swh-indexer-journal-client_1    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
swh-indexer-journal-client_1    |     return ctx.invoke(self.callback, **ctx.params)
swh-indexer-journal-client_1    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/click/core.py", line 610, in invoke
swh-indexer-journal-client_1    |     return callback(*args, **kwargs)
swh-indexer-journal-client_1    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/click/decorators.py", line 21, in new_func
swh-indexer-journal-client_1    |     return f(get_current_context(), *args, **kwargs)
swh-indexer-journal-client_1    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/indexer/cli.py", line 260, in journal_client
swh-indexer-journal-client_1    |     stop_after_objects=stop_after_objects,
swh-indexer-journal-client_1    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/journal/client.py", line 38, in get_journal_client
swh-indexer-journal-client_1    |     return JournalClient(**kwargs)
swh-indexer-journal-client_1    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/journal/client.py", line 175, in __init__
swh-indexer-journal-client_1    |     for topic in self.consumer.list_topics(timeout=10).topics.keys()
swh-indexer-journal-client_1    | cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
docker_swh-indexer-journal-client_1 exited with code 1

What has caused this? thanks

Event Timeline

rendong951 created this object in space S1 Public.
olasd removed olasd as the assignee of this task.Dec 7 2020, 11:14 AM
olasd added a subscriber: olasd.
olasd removed a subscriber: olasd.
zack triaged this task as High priority.Dec 7 2020, 11:16 AM
zack added a project: Development environment.
zack updated the task description. (Show Details)

Thanks for opening this issue.

To progress further, it would help to clarify what the state of your
swh-environment is first.

For example, check the installed version used for those services [1]

swh-indexer-journal-client_1

I don't know for that one yet.

I think it relates to the starting state of the kafka server which has no
topics at all within.

I don't reproduce this one at the current version [1] [2].

swh-vault_1

This looks like an issue from a previous version. I don't reproduce it either.

Have you tried recently to rebuild your docker environment? [3]

[1] vault

$ doco logs swh-indexer-journal-client | grep " swh.vault "
swh-indexer-journal-client_1    | swh.vault             0.4.0

[2] indexer

$ doco logs swh-indexer-journal-client | grep " swh.indexer "
swh-indexer-journal-client_1    | swh.indexer           0.5.0

[3]

$ docker --version
Docker version 18.09.1, build 4c52b90
$ docker build --no-cache -t swh/stack .

Cheers,

ardumont renamed this task from Service Startup Error to docker: Service Startup Error.Dec 9 2020, 6:40 PM
ardumont added a project: Docker environment.

Thank you very much for your reply.
And I follow your guidance,now,all the services are started,but when I save code in my Software Heritage platform, the status always failed, I checked log :
swh-loader_1 | [2020-12-18 02:49:13,495: ERROR/ForkPoolWorker-1] Loading failure, updating to partial status
swh-loader_1 | Traceback (most recent call last):
swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/client.py", line 912, in fetch_pack
swh-loader_1 | refs, server_capabilities = read_pkt_refs(proto)
swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/client.py", line 215, in read_pkt_refs
swh-loader_1 | for pkt in proto.read_pkt_seq():
swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/protocol.py", line 277, in read_pkt_seq
swh-loader_1 | pkt = self.read_pkt_line()
swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/protocol.py", line 223, in read_pkt_line
swh-loader_1 | raise HangupException()
swh-loader_1 | dulwich.errors.HangupException: The remote server unexpectedly closed the connection.
swh-loader_1 |
swh-loader_1 | During handling of the above exception, another exception occurred:
swh-loader_1 |
swh-loader_1 | Traceback (most recent call last):
swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 318, in load
swh-loader_1 | more_data_to_fetch = self.fetch_data()
swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/git/loader.py", line 239, in fetch_data
swh-loader_1 | self.origin.url, self.base_snapshot, do_progress
swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/git/loader.py", line 174, in fetch_pack_from_origin
swh-loader_1 | progress=do_activity,
swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/client.py", line 914, in fetch_pack
swh-loader_1 | raise _remote_error_from_stderr(stderr)
swh-loader_1 | dulwich.errors.HangupException: ssh: Could not resolve hostname https: Temporary failure in name resolution

In addition,I login in Kakfa service and execute command to list topic,error occurred:
Error: JMX connector server communication error: service:jmx:rmi://061b80a7989a:1099
sun.management.AgentConfigurationError: java.rmi.server.ExportException: Port already in use: 1099; nested exception is:

java.net.BindException: Address in use (Bind failed)
at sun.management.jmxremote.ConnectorBootstrap.exportMBeanServer(ConnectorBootstrap.java:800)
at sun.management.jmxremote.ConnectorBootstrap.startRemoteConnectorServer(ConnectorBootstrap.java:468)
at sun.management.Agent.startAgent(Agent.java:262)
at sun.management.Agent.startAgent(Agent.java:452)

Caused by: java.rmi.server.ExportException: Port already in use: 1099; nested exception is:

java.net.BindException: Address in use (Bind failed)
at sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:346)
at sun.rmi.transport.tcp.TCPTransport.exportObject(TCPTransport.java:254)
at sun.rmi.transport.tcp.TCPEndpoint.exportObject(TCPEndpoint.java:411)
at sun.rmi.transport.LiveRef.exportObject(LiveRef.java:147)
at sun.rmi.server.UnicastServerRef.exportObject(UnicastServerRef.java:237)
at sun.management.jmxremote.ConnectorBootstrap$PermanentExporter.exportObject(ConnectorBootstrap.java:199)
at javax.management.remote.rmi.RMIJRMPServerImpl.export(RMIJRMPServerImpl.java:146)
at javax.management.remote.rmi.RMIJRMPServerImpl.export(RMIJRMPServerImpl.java:122)
at javax.management.remote.rmi.RMIConnectorServer.start(RMIConnectorServer.java:404)
at sun.management.jmxremote.ConnectorBootstrap.exportMBeanServer(ConnectorBootstrap.java:796)
... 3 more

Caused by: java.net.BindException: Address in use (Bind failed)

at java.net.PlainSocketImpl.socketBind(Native Method)
at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
at java.net.ServerSocket.bind(ServerSocket.java:375)
at java.net.ServerSocket.<init>(ServerSocket.java:237)
at java.net.ServerSocket.<init>(ServerSocket.java:128)
at sun.rmi.transport.proxy.RMIDirectSocketFactory.createServerSocket(RMIDirectSocketFactory.java:45)
at sun.rmi.transport.proxy.RMIMasterSocketFactory.createServerSocket(RMIMasterSocketFactory.java:345)
at sun.rmi.transport.tcp.TCPEndpoint.newServerSocket(TCPEndpoint.java:666)
at sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:335)
... 12 more

I appreciate your quick response

Please wrap your stacktrace, excerpt of code, etc... with triple backquote before and after.
So this becomes more readable.
Thanks in advance.

swh-loader_1 | dulwich.errors.HangupException: ssh: Could not resolve hostname https: Temporary failure in name resolution

DNS problem within docker... ¯\_(ツ)_/¯

Are you able to resolve whatever url you ask for loading?

You can also run the following command which is the gist of what save code now does
internally (another way but still)

$ docker-compose exec swh-loader run git $resolvable-url

Error: JMX connector server communication error: service:jmx:rmi://061b80a7989a:1099

I think i got that one and it's related to kafka and JMX environment variable.
You need to unset some JMX variables...
Yes, unset JMX_OPTS. [2]

[1]

$ docker-compose exec swh-loader swh loader run git https://github.com/softwareheritage/swh-scanner
INFO:swh.loader.git.BulkLoader:Load origin 'https://github.com/softwareheritage/swh-scanner' with type 'git'
Enumerating objects: 592, done.
Counting objects: 100% (592/592), done.
Compressing objects: 100% (254/254), done.
Total 592 (delta 333), reused 567 (delta 308), pack-reused 0
INFO:swh.loader.git.BulkLoader:Listed 6 refs for repo https://github.com/softwareheritage/swh-scanner

[2]

$ docker-compose -f docker-compose.yml -f docker-compose.search.yml -f docker-compose.override.yml exec kafka bash
bash-4.4# cd /opt/kafka
bash-4.4# ./bin/kafka-consumer-groups.sh --bootstrap-server kafka:9092 --list --all-topics
Error: JMX connector server communication error: service:jmx:rmi://c67a39d6cf4b:1099
...
Caused by: java.net.BindException: Address in use (Bind failed)
...
bash-4.4# env | grep jmx
KAFKA_JMX_OPTS=-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=kafka -Dcom.sun.management.jmxremote.rmi.port=1099
bash-4.4# env | grep -i jmx
KAFKA_JMX_OPTS=-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=kafka -Dcom.sun.management.jmxremote.rmi.port=1099
JMX_PORT=1099
bash-4.4# unset JMX_PORT
bash-4.4# ./bin/kafka-consumer-groups.sh --bootstrap-server kafka:9092 --list --all-topics
bash-4.4# # no output, it's empty as hypothesized before

Cheers,

Thank you very much for your reply.
I followed your command.[1]

$ docker-compose exec swh-loader swh loader run git https://github.com/softwareheritage/swh-scanner

INFO:swh.loader.git.BulkLoader:Load origin 'https://github.com/softwareheritage/swh-scanner' with type 'git'
ERROR:swh.loader.git.BulkLoader:Loading failure, updating to partial status
Traceback (most recent call last):

File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/client.py", line 913, in fetch_pack
  refs, server_capabilities = read_pkt_refs(proto)
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/client.py", line 216, in read_pkt_refs
  for pkt in proto.read_pkt_seq():
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/protocol.py", line 277, in read_pkt_seq
  pkt = self.read_pkt_line()
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/protocol.py", line 223, in read_pkt_line
  raise HangupException()

dulwich.errors.HangupException: The remote server unexpectedly closed the connection.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 318, in load
  more_data_to_fetch = self.fetch_data()
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/git/loader.py", line 239, in fetch_data
  self.origin.url, self.base_snapshot, do_progress
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/git/loader.py", line 174, in fetch_pack_from_origin
  progress=do_activity,
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/client.py", line 915, in fetch_pack
  raise _remote_error_from_stderr(stderr)

dulwich.errors.HangupException: ssh: Could not resolve hostname https: Temporary failure in name resolution
{'status': 'failed'}

you mean this is DNS problem within docker, can you help me how to configure this?

I appreciate your quick response

Hello,

It's not a DNS issue. The error is Could not resolve hostname https:. It shouldn't try to resolve https:, it's a scheme, not a domain name.

This is probably a typo or a syntax issue in your command. Check you didn't add an invisible whitespace between : and / in the URL.

Also try adding quotes, eg:

docker-compose exec swh-loader swh loader run git "https://github.com/softwareheritage/swh-scanner"

or

docker-compose exec swh-loader "swh loader run git https://github.com/softwareheritage/swh-scanner"

Thanks.

I also try command fllowing your reply,the results are the same.

All services can be started normally and the network is normal,

and I also ‘git clone https://github.com/softwareheritage/swh-scanner’ on swh-loader service,it’ s OK.

Now the problem is when I save code in my Software Heritage platform OR execute above command,error occurred.

$docker-compose exec swh-loader swh loader run git "https://github.com/softwareheritage/swh-scanner"
$docker-compose exec swh-loader swh loader run git "https://github.com/softwareheritage/swh-scanner/"

INFO:swh.loader.git.BulkLoader:Load origin 'https://github.com/softwareheritage/swh-scanner/' with type 'git'
ERROR:swh.loader.git.BulkLoader:Loading failure, updating to partial status
Traceback (most recent call last):

File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/client.py", line 913, in fetch_pack
  refs, server_capabilities = read_pkt_refs(proto)
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/client.py", line 216, in read_pkt_refs
  for pkt in proto.read_pkt_seq():
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/protocol.py", line 277, in read_pkt_seq
  pkt = self.read_pkt_line()
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/protocol.py", line 223, in read_pkt_line
  raise HangupException()

dulwich.errors.HangupException: The remote server unexpectedly closed the connection.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 318, in load
  more_data_to_fetch = self.fetch_data()
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/git/loader.py", line 239, in fetch_data
  self.origin.url, self.base_snapshot, do_progress
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/git/loader.py", line 174, in fetch_pack_from_origin
  progress=do_activity,
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/client.py", line 915, in fetch_pack
  raise _remote_error_from_stderr(stderr)

dulwich.errors.HangupException: ssh: Could not resolve hostname https: Name or service not known
{'status': 'failed'}

vlorentz lowered the priority of this task from High to Normal.Mar 15 2021, 1:14 PM

@rendong951 Sorry, but we can't reproduce the issue on our end.

Did you manage to fix the issue and/or did it stop occurring?

If not, make sure you don't have a docker-compose.override.yml file, or any change to you local dockerfiles and docker-compose files.