您的位置:首页 > 编程语言 > Java开发

记录一次排查极光推送SDK死锁问题

2017-11-27 11:52 816 查看
线上服务稳定了好几个月,这两天抽风,总是有mysql数据行被锁,update会报错(Lock wait timeout exceeded; try restarting transaction)

导致系统无法使用,害得我得一直在后台给用户解锁(通过mysql的kill命令)/(ㄒoㄒ)/~~

极光sdk版本(接入sdk时从官方文档复制的,一直没更新,可谁没事会去更新这个呢):

<!-- JPush -->
<dependency>
<groupId>cn.jpush.api</groupId>
<artifactId>jpush-client</artifactId>
<version>3.2.15</version>
</dependency>
<dependency>
<groupId>cn.jpush.api</groupId>
<artifactId>jiguang-common</artifactId>
<version>1.0.1</version>
</dependency>
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
<version>4.1.6.Final</version>
</dependency>


通过mysql后台查询,会发现有几个事务一直不提交,一直锁住了数据。

由于之前一直后台功能一直正常,并没有往代码方面想,找了一天数据库原因未果(尼玛的)!

经过一天观察,发现只有遇到极光推送的地方才会加锁,而且是不定时出现,故重点观察极光推送代码(为log4j添加了打印线程id的配置)!

第一步:通过观察log,发现极光推送之后该线程在log中再也不出现了(只打印了关键代码3中LastHttpContent判定逻辑中的语句,死锁的原因)

通过jps和jstack定位到了极光sdk的死锁问题(真坑):



第二步:分析原因

关键代码1:cn.jiguang.common.connection.NettyHttpClient.sendHttpRequest:158

private ResponseWrapper sendHttpRequest(HttpMethod method, String url, String body) throws Exception {
CountDownLatch latch = new CountDownLatch(1);// 注意此处实例化了一个CountDownLatch 并传递给了NettyClientInitializer
NettyClientInitializer initializer = new NettyClientInitializer(_sslCtx, null, latch);
b.handler(initializer);
ResponseWrapper wrapper = new ResponseWrapper();
URI uri = new URI(url);
String scheme = uri.getScheme() == null ? "http" : uri.getScheme();
String host = uri.getHost() == null ? "127.0.0.1" : uri.getHost();
int port = uri.getPort();
if (port == -1) {
if ("http".equalsIgnoreCase(scheme)) {
port = 80;
} else if ("https".equalsIgnoreCase(scheme)) {
port = 443;
}
}

try {
ChannelFuture connect = b.connect(host, port);
_channel = connect.sync().channel();
FullHttpRequest request;
if (null != body) {
ByteBuf byteBuf = Unpooled.copiedBuffer(body.getBytes(CharsetUtil.UTF_8));
request = new DefaultFullHttpRequest(HttpVersion.HTTP_1_1, method, uri.getRawPath(), byteBuf);
request.headers().set(HttpHeaderNames.CONTENT_LENGTH, (long) byteBuf.readableBytes());
} else {
request = new DefaultFullHttpRequest(HTTP_1_1, method, uri.getRawPath());
}
request.headers().set(HttpHeaderNames.HOST, uri.getHost());
request.headers().set(HttpHeaderNames.AUTHORIZATION, _authCode);
request.headers().set("Content-Type","application/json;charset=utf-8");
connect.awaitUninterruptibly();
LOG.info("Sending request. " + request);
LOG.info("Send body: " + body);
_channel.writeAndFlush(request);
latch.await();// 等待继续执行,这个地方使用CountDownLatch的作用就是等待netty异步网络调用返回结果后继续执行,所以此处等待,如果netty执行完后不countDown线程将永远等待在这个地方
wrapper = initializer.getResponse();
} catch (InterruptedException e) {
e.printStackTrace();
}
return wrapper;
}
关键代码2:cn.jiguang.common.connection.NettyClientInitializer.initChannel:25

public NettyClientInitializer(SslContext sslContext, NettyHttpClient.BaseCallback callback, CountDownLatch latch) {
this._sslCtx = sslContext;
this._callback = callback;
this._latch = latch;
}

@Override
protected void initChannel(SocketChannel socketChannel) throws Exception {
this._handler = new HttpResponseHandler(_callback, _latch);// 此处将上一段代码实例化的CountDownLatch继续传递给了cn.jiguang.common.connection.HttpResponseHandler
socketChannel.pipeline().addLast(_sslCtx.newHandler(socketChannel.alloc()), new HttpClientCodec(), _handler);
}
关键代码3:cn.jiguang.common.connection.HttpResponseHandler

public class HttpResponseHandler extends SimpleChannelInboundHandler<HttpObject> {

private static final Logger LOG = LoggerFactory.getLogger(HttpResponseHandler.class);
private int status;
private NettyHttpClient.BaseCallback _callback;
private CountDownLatch _latch;
private ResponseWrapper _wrapper = new ResponseWrapper();

public HttpResponseHandler(NettyHttpClient.BaseCallback callback, CountDownLatch latch) {
this._callback = callback;
this._latch = latch;
}

@Override
protected void channelRead0(ChannelHandlerContext ctx, HttpObject msg) throws Exception {
if (msg instanceof HttpResponse) {
HttpResponse response = (HttpResponse) msg;
status = response.status().code();
}
if (msg instanceof HttpContent) {
HttpContent content = (HttpContent) msg;
LOG.info(content.content().toString());
if (content instanceof LastHttpContent) {// 通过观察日志,发现此处仅执行内容LastHttpContent的代码,并未返回HttpContent,推测是极光后台更新了,对之前的sdk兼容有问题
LOG.info("closing connection");
ctx.close();
} else {
String responseContent = content.content().toString(CharsetUtil.UTF_8);
_wrapper.responseCode = status;
_wrapper.responseContent = responseContent;
LOG.info("Got Response code: " + status + " content: " + responseContent);
System.err.println("Got Response code: " + status + " content: " + responseContent);
System.err.flush();
if (null != _callback) {
_callback.onSucceed(_wrapper);
}
if (null != _latch) {
_latch.countDown();
}
}
}
}

@Override
public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) {
LOG.error("error:", cause);
try {
ctx.close();
}catch (Exception ex) {
LOG.error("close error:", ex);
}
}

public void resetLatch(CountDownLatch latch) {
this._latch = latch;
}

public ResponseWrapper getResponse() {
return _wrapper;
}
}
此段代码在异常时(exceptionCaught)也未进行countDown调用,也会死锁!!!着实让人堪忧啊!

最后通过升级了极光的sdk解决了死锁问题!!!

总结:CountLatchDown一定要注意countDown,否则就会死锁,挂住线程不提交事务,造成数据库行锁。

并且查询数据库事务表看到的执行sql是null,并没有正在执行的sql,应该就可以判定是事务未提交,分析为什么未提交就好了!
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  Java 极光推送