I reversed engineered the py4j communication protocol, so next time it will hang, we should be able to tell if the issue is on the gateway server side or on the python side:
- Create a name pipe
mkfifo /tmp/test chmod a+w /tmp/test tail -F /tmp/test
- query the graph
ss -ltp | grep java <get the port number> telnet localhost <port number> c o0 get_handler s/tmp/test e