提问人:Marc Dätwyler 提问时间:11/6/2023 更新时间:11/6/2023 访问量:39
Wildfly/JGroups DNS_Ping发现机制似乎泄露了线程
Wildfly/JGroups DNS_Ping discovery mechanism seems to leak threads
问:
我们目前在 Kubernetes 环境中使用 Wildfly/JGroups 集群时遇到了一个问题。 我们有不同数量的 Wildfly (30.0.0) 节点,这些节点需要相互通信并形成一个集群来处理 ArtemisMQ JMS 消息。我们正在使用 dns。DNS_PING用于在集群中发现,TCP 作为 JGroups 的主要协议。
我们使用以下 Wildfly CLI 命令来设置 JGroups 集群:
`echo "Kubernetes interface and bindings"/interface=kubernetes:add(nic=eth0)/interface=private:add(inet-address="${jboss.bind.address.private:127.0.0.1}")/interface=dns:add(site-local-address=true)/socket-binding-group=standard-sockets/socket-binding=jgroups-tcp:add(interface=dns, port=7800)/socket-binding-group=standard-sockets/socket-binding=jgroups-tcp-fd:add(interface=dns, port=57800)/socket-binding-group=standard-sockets/socket-binding=http:write-attribute(name=interface,value=dns)/socket-binding-group=standard-sockets/socket-binding=https:write-attribute(name=interface,value=dns)
echo "JGroups"/extension=org.jboss.as.clustering.jgroups:add()/subsystem=jgroups:add()#/subsystem=jgroups:write-attribute(name=default-stack,value=tcp)
echo "TCP stack"batch/subsystem=jgroups/stack=tcp:add()#/subsystem=jgroups/stack=tcp:add/subsystem=jgroups/stack=tcp/transport=TCP:add(socket-binding=jgroups-tcp)/subsystem=jgroups/stack=tcp/protocol=MERGE3:add/subsystem=jgroups/stack=tcp/protocol=FD_SOCK:add(socket-binding=jgroups-tcp-fd)/subsystem=jgroups/stack=tcp/protocol=VERIFY_SUSPECT:add/subsystem=jgroups/stack=tcp/protocol=pbcast.NAKACK2:add/subsystem=jgroups/stack=tcp/protocol=UNICAST3:add/subsystem=jgroups/stack=tcp/protocol=pbcast.STABLE:add/subsystem=jgroups/stack=tcp/protocol=pbcast.GMS:add/subsystem=jgroups/stack=tcp/protocol=MFC:add/subsystem=jgroups/stack=tcp/protocol=FRAG3:addrun-batch
echo "JGroups Channel"/subsystem=jgroups/channel=ee:add(stack=tcp)/subsystem=jgroups/channel=ee:write-attribute(name=stack,value=tcp)#/subsystem=jgroups/channel=ee:write-attribute(name=cluster,value=kubernetes)/subsystem=jgroups:write-attribute(name=default-channel,value=ee)
echo "DNS_PING Protocol"/subsystem=jgroups/stack=tcp/protocol=dns.DNS_PING:add(add-index=0,properties={dns_query="_ping._tcp.avaloq-wb-sync-manager-ping.namespace001.svc.cluster.local.",dns_record_type=SRV})`
DNS_PING查询指向一个 Kubernetes 服务,该服务公开了我们希望在集群中拥有的节点。
现在,在高效部署中,我们获得了DNS_PING创建的大量线程。我们还看到,一个线程阻塞了其他线程,并挂在“PlainSocket.socketConnect”方法中。我们sock_conn_timeout JGroups 设置为 300 毫秒,所以这种等待应该不会真正发生。
最后,Wildfly 无法再启动任何线程(无法再创建操作系统级别的线程)。我们仍然不确定究竟是什么原因导致了这个问题,但我们假设这可能是达到了文件描述符限制。最后,我们有大约 4000 个线程,其中大约 75% 与 DNS-Ping 相关。
挂线如下所示:
{
"thread-id" => 109424945L,
"thread-name" => "Timer temp thread-20460,ee,avaloq-wb-sync-manager-0",
"thread-state" => "RUNNABLE",
"blocked-time" => -1L,
"blocked-count" => 1L,
"waited-time" => -1L,
"waited-count" => 1L,
"lock-info" => undefined,
"lock-name" => undefined,
"lock-owner-id" => -1L,
"lock-owner-name" => undefined,
"stack-trace" => [
{
"file-name" => "PlainSocketImpl.java",
"line-number" => -2,
"class-name" => "java.net.PlainSocketImpl",
"method-name" => "socketConnect",
"native-method" => true
},
{
"file-name" => "AbstractPlainSocketImpl.java",
"line-number" => 412,
"class-name" => "java.net.AbstractPlainSocketImpl",
"method-name" => "doConnect",
"native-method" => false
},
{
"file-name" => "AbstractPlainSocketImpl.java",
"line-number" => 255,
"class-name" => "java.net.AbstractPlainSocketImpl",
"method-name" => "connectToAddress",
"native-method" => false
},
{
"file-name" => "AbstractPlainSocketImpl.java",
"line-number" => 237,
"class-name" => "java.net.AbstractPlainSocketImpl",
"method-name" => "connect",
"native-method" => false
},
{
"file-name" => "SocksSocketImpl.java",
"line-number" => 392,
"class-name" => "java.net.SocksSocketImpl",
"method-name" => "connect",
"native-method" => false
},
{
"file-name" => "Socket.java",
"line-number" => 609,
"class-name" => "java.net.Socket",
"method-name" => "connect",
"native-method" => false
},
{
"file-name" => "Util.java",
"line-number" => 461,
"class-name" => "org.jgroups.util.Util",
"method-name" => "connect",
"native-method" => false
},
{
"file-name" => "TcpConnection.java",
"line-number" => 96,
"class-name" => "org.jgroups.blocks.cs.TcpConnection",
"method-name" => "connect",
"native-method" => false
},
{
"file-name" => "TcpConnection.java",
"line-number" => 88,
"class-name" => "org.jgroups.blocks.cs.TcpConnection",
"method-name" => "connect",
"native-method" => false
},
{
"file-name" => "BaseServer.java",
"line-number" => 295,
"class-name" => "org.jgroups.blocks.cs.BaseServer",
"method-name" => "getConnection",
"native-method" => false
},
{
"file-name" => "BaseServer.java",
"line-number" => 208,
"class-name" => "org.jgroups.blocks.cs.BaseServer",
"method-name" => "send",
"native-method" => false
},
{
"file-name" => "TCP.java",
"line-number" => 91,
"class-name" => "org.jgroups.protocols.TCP",
"method-name" => "send",
"native-method" => false
},
{
"file-name" => "BasicTCP.java",
"line-number" => 146,
"class-name" => "org.jgroups.protocols.BasicTCP",
"method-name" => "sendUnicast",
"native-method" => false
},
{
"file-name" => "TP.java",
"line-number" => 1638,
"class-name" => "org.jgroups.protocols.TP",
"method-name" => "sendToSingleMember",
"native-method" => false
},
{
"file-name" => "TP.java",
"line-number" => 1632,
"class-name" => "org.jgroups.protocols.TP",
"method-name" => "doSend",
"native-method" => false
},
{
"file-name" => "NoBundler.java",
"line-number" => 38,
"class-name" => "org.jgroups.protocols.NoBundler",
"method-name" => "sendSingleMessage",
"native-method" => false
},
{
"file-name" => "NoBundler.java",
"line-number" => 30,
"class-name" => "org.jgroups.protocols.NoBundler",
"method-name" => "send",
"native-method" => false
},
{
"file-name" => "TP.java",
"line-number" => 1620,
"class-name" => "org.jgroups.protocols.TP",
"method-name" => "send",
"native-method" => false
},
{
"file-name" => "TP.java",
"line-number" => 1353,
"class-name" => "org.jgroups.protocols.TP",
"method-name" => "_send",
"native-method" => false
},
{
"file-name" => "TP.java",
"line-number" => 1262,
"class-name" => "org.jgroups.protocols.TP",
"method-name" => "down",
"native-method" => false
},
{
"file-name" => "DNS_PING.java",
"line-number" => 189,
"class-name" => "org.jgroups.protocols.dns.DNS_PING",
"method-name" => "sendDiscoveryRequest",
"native-method" => false
},
{
"file-name" => "DNS_PING.java",
"line-number" => 182,
"class-name" => "org.jgroups.protocols.dns.DNS_PING",
"method-name" => "findMembers",
"native-method" => false
},
{
"file-name" => "Discovery.java",
"line-number" => 217,
"class-name" => "org.jgroups.protocols.Discovery",
"method-name" => "invokeFindMembers",
"native-method" => false
},
{
"file-name" => "Discovery.java",
"line-number" => 228,
"class-name" => "org.jgroups.protocols.Discovery",
"method-name" => "lambda$findMembers$0",
"native-method" => false
},
{
"file-name" => undefined,
"line-number" => -1,
"class-name" => "org.jgroups.protocols.Discovery$$Lambda$968/0x0000000840b0bc40",
"method-name" => "run",
"native-method" => false
},
{
"file-name" => "TimeScheduler3.java",
"line-number" => 324,
"class-name" => "org.jgroups.util.TimeScheduler3$Task",
"method-name" => "run",
"native-method" => false
},
{
"file-name" => "ContextReferenceExecutor.java",
"line-number" => 49,
"class-name" => "org.jboss.as.clustering.context.ContextReferenceExecutor",
"method-name" => "execute",
"native-method" => false
},
{
"file-name" => "ContextualExecutor.java",
"line-number" => 70,
"class-name" => "org.jboss.as.clustering.context.ContextualExecutor$1",
"method-name" => "run",
"native-method" => false
},
{
"file-name" => "Thread.java",
"line-number" => 829,
"class-name" => "java.lang.Thread",
"method-name" => "run",
"native-method" => false
}
],
"suspended" => false,
"in-native" => false,
"locked-monitors" => [{
"class-name" => "java.net.SocksSocketImpl",
"identity-hash-code" => 139076230,
"locked-stack-depth" => 1,
"locked-stack-frame" => {
"file-name" => "AbstractPlainSocketImpl.java",
"line-number" => 412,
"class-name" => "java.net.AbstractPlainSocketImpl",
"method-name" => "doConnect",
"native-method" => false
}
}],
"locked-synchronizers" => [{
"class-name" => "java.util.concurrent.locks.ReentrantLock$FairSync",
"identity-hash-code" => 740591308
}]
},
还有一个典型的等待线程:
"thread-id" => 109424946L,
"thread-name" => "Timer temp thread-20461,ee,avaloq-wb-sync-manager-0",
"thread-state" => "WAITING",
"blocked-time" => -1L,
"blocked-count" => 1L,
"waited-time" => -1L,
"waited-count" => 1L,
"lock-info" => {
"class-name" => "java.util.concurrent.locks.ReentrantLock$FairSync",
"identity-hash-code" => 740591308
},
"lock-name" => "java.util.concurrent.locks.ReentrantLock$FairSync@2c2486cc",
"lock-owner-id" => 109424945L,
"lock-owner-name" => "Timer temp thread-20460,ee,avaloq-wb-sync-manager-0",
"stack-trace" => [
{
"file-name" => "Unsafe.java",
"line-number" => -2,
"class-name" => "jdk.internal.misc.Unsafe",
"method-name" => "park",
"native-method" => true
},
{
"file-name" => "LockSupport.java",
"line-number" => 194,
"class-name" => "java.util.concurrent.locks.LockSupport",
"method-name" => "park",
"native-method" => false
},
{
"file-name" => "AbstractQueuedSynchronizer.java",
"line-number" => 885,
"class-name" => "java.util.concurrent.locks.AbstractQueuedSynchronizer",
"method-name" => "parkAndCheckInterrupt",
"native-method" => false
},
{
"file-name" => "AbstractQueuedSynchronizer.java",
"line-number" => 943,
"class-name" => "java.util.concurrent.locks.AbstractQueuedSynchronizer",
"method-name" => "doAcquireInterruptibly",
"native-method" => false
},
{
"file-name" => "AbstractQueuedSynchronizer.java",
"line-number" => 1263,
"class-name" => "java.util.concurrent.locks.AbstractQueuedSynchronizer",
"method-name" => "acquireInterruptibly",
"native-method" => false
},
{
"file-name" => "ReentrantLock.java",
"line-number" => 317,
"class-name" => "java.util.concurrent.locks.ReentrantLock",
"method-name" => "lockInterruptibly",
"native-method" => false
},
{
"file-name" => "BaseServer.java",
"line-number" => 277,
"class-name" => "org.jgroups.blocks.cs.BaseServer",
"method-name" => "getConnection",
"native-method" => false
},
{
"file-name" => "BaseServer.java",
"line-number" => 208,
"class-name" => "org.jgroups.blocks.cs.BaseServer",
"method-name" => "send",
"native-method" => false
},
{
"file-name" => "TCP.java",
"line-number" => 91,
"class-name" => "org.jgroups.protocols.TCP",
"method-name" => "send",
"native-method" => false
},
{
"file-name" => "BasicTCP.java",
"line-number" => 146,
"class-name" => "org.jgroups.protocols.BasicTCP",
"method-name" => "sendUnicast",
"native-method" => false
},
{
"file-name" => "TP.java",
"line-number" => 1638,
"class-name" => "org.jgroups.protocols.TP",
"method-name" => "sendToSingleMember",
"native-method" => false
},
{
"file-name" => "TP.java",
"line-number" => 1632,
"class-name" => "org.jgroups.protocols.TP",
"method-name" => "doSend",
"native-method" => false
},
{
"file-name" => "NoBundler.java",
"line-number" => 38,
"class-name" => "org.jgroups.protocols.NoBundler",
"method-name" => "sendSingleMessage",
"native-method" => false
},
{
"file-name" => "NoBundler.java",
"line-number" => 30,
"class-name" => "org.jgroups.protocols.NoBundler",
"method-name" => "send",
"native-method" => false
},
{
"file-name" => "TP.java",
"line-number" => 1620,
"class-name" => "org.jgroups.protocols.TP",
"method-name" => "send",
"native-method" => false
},
{
"file-name" => "TP.java",
"line-number" => 1353,
"class-name" => "org.jgroups.protocols.TP",
"method-name" => "_send",
"native-method" => false
},
{
"file-name" => "TP.java",
"line-number" => 1262,
"class-name" => "org.jgroups.protocols.TP",
"method-name" => "down",
"native-method" => false
},
{
"file-name" => "DNS_PING.java",
"line-number" => 189,
"class-name" => "org.jgroups.protocols.dns.DNS_PING",
"method-name" => "sendDiscoveryRequest",
"native-method" => false
},
{
"file-name" => "DNS_PING.java",
"line-number" => 182,
"class-name" => "org.jgroups.protocols.dns.DNS_PING",
"method-name" => "findMembers",
"native-method" => false
},
{
"file-name" => "Discovery.java",
"line-number" => 217,
"class-name" => "org.jgroups.protocols.Discovery",
"method-name" => "invokeFindMembers",
"native-method" => false
},
{
"file-name" => "Discovery.java",
"line-number" => 228,
"class-name" => "org.jgroups.protocols.Discovery",
"method-name" => "lambda$findMembers$0",
"native-method" => false
},
{
"file-name" => undefined,
"line-number" => -1,
"class-name" => "org.jgroups.protocols.Discovery$$Lambda$968/0x0000000840b0bc40",
"method-name" => "run",
"native-method" => false
},
{
"file-name" => "TimeScheduler3.java",
"line-number" => 324,
"class-name" => "org.jgroups.util.TimeScheduler3$Task",
"method-name" => "run",
"native-method" => false
},
{
"file-name" => "ContextReferenceExecutor.java",
"line-number" => 49,
"class-name" => "org.jboss.as.clustering.context.ContextReferenceExecutor",
"method-name" => "execute",
"native-method" => false
},
{
"file-name" => "ContextualExecutor.java",
"line-number" => 70,
"class-name" => "org.jboss.as.clustering.context.ContextualExecutor$1",
"method-name" => "run",
"native-method" => false
},
{
"file-name" => "Thread.java",
"line-number" => 829,
"class-name" => "java.lang.Thread",
"method-name" => "run",
"native-method" => false
}
],
"suspended" => false,
"in-native" => false,
"locked-monitors" => [],
"locked-synchronizers" => []
},
有没有人遇到过类似的问题?
答: 暂无答案
评论