您的位置:首页 > 其它

记一次线上请求缓慢问题处理

2017-09-06 11:36 204 查看
9月的某天晚上,接到通知,线上环境请求非常慢,页面打开要1分钟,遂按照系统参数,数据库,应用,web服务器状态逐步排查的步骤,最终定位到问题并得以解决,此处做个检查流程梳理,供大家参考。

问题描述:正式环境请求响应异常慢,页面打开需要1分钟左右时间等。

第一步 系统参数检查

1 df -h 检查硬盘空间



重点看 Use这一栏,可以看到空间足够,硬盘空间ok

2 free 检查内存使用情况



对比total和used两栏,或者直接看free这一栏,默认是单位是KB

free [-b -k -m] [-o] [-s delay] [-t] [-V]

-b -k -m:分别以字节(KB、MB)为单位显示内存使用情况

可以发现内存也是ok

3 top命令检查线程的cpu负载情况



重点看CPU这一栏,我这个图是后来补的图,现场的图忘记保存了,当时发现有个java线程的CPU的使用率超过了100%,此处可发现有问题

4 通过jstack 命令查看线程内存信息

jstack pid

root@172_16_1_214 webapps]# jstack 9495
9495: Unable to open socket file: target process not responding or HotSpot VM not loaded
The -F option can be used when the target process is not responding

root@172_16_1_214 logs]# tail -n 800 catalina.out
at java.lang.Thread.run(Thread.java:745)

"updatePromotionStatusTask-8-exe0" daemon prio=10 tid=0x00000000017cb000 nid=0x257b runnable [0x00007f9475ca8000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at java.net.SocketInputStream.read(SocketInputStream.java:108)
at redis.clients.util.RedisInputStream.ensureFill(RedisInputStream.java:196)
at redis.clients.util.RedisInputStream.readByte(RedisInputStream.java:40)
at redis.clients.jedis.Protocol.process(Protocol.java:151)
at redis.clients.jedis.Protocol.read(Protocol.java:215)
at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:340)
at redis.clients.jedis.Connection.getIntegerReply(Connection.java:265)
at redis.clients.jedis.Jedis.hexists(Jedis.java:779)
at cn.htd.common.dao.util.RedisDB.getHash(RedisDB.java:255)
at cn.htd.common.util.DictionaryUtils.getDictionaryOptList(DictionaryUtils.java:250)
at cn.htd.common.util.DictionaryUtils.getValueByCode(DictionaryUtils.java:207)
at cn.htd.promotion.cpc.biz.task.UpdatePromotionStatusTask.execute(UpdatePromotionStatusTask.java:143)
at cn.htd.promotion.cpc.biz.task.UpdatePromotionStatusTask.execute(UpdatePromotionStatusTask.java:32)
at cn.htd.promotion.cpc.biz.task.UpdatePromotionStatusTask$$FastClassBySpringCGLIB$$b13b01ff.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:720)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
at org.springframework.aop.framework.adapter.AfterReturningAdviceInterceptor.invoke(AfterReturningAdviceInterceptor.java:52)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at cn.htd.common.interceptor.MethodAccessLogInterceptor.invoke(MethodAccessLogInterceptor.java:48)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:655)
at cn.htd.promotion.cpc.biz.task.UpdatePromotionStatusTask$$EnhancerBySpringCGLIB$$688d5abb.execute(<generated>)
at com.taobao.pamirs.schedule.taskmanager.TBScheduleProcessorSleep.run(TBScheduleProcessorSleep.java:239)
at java.lang.Thread.run(Thread.java:745)

"DubboServerHandler-172.16.1.214:28098-thread-25" daemon prio=10 tid=0x00007f9408032000 nid=0x257a waiting on condition [0x00007f9475daa000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00000000c5fefe50> (a java.util.concurrent.SynchronousQueue$TransferStack)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:458)
at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
at java.util.concurrent.SynchronousQueue.take(SynchronousQueue.java:925)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

"DubboServerHandler-172.16.1.214:28098-thread-24" daemon prio=10 tid=0x00007f9408030000 nid=0x2577 waiting on condition [0x00007f9475eab000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00000000c5fefe50> (a java.util.concurrent.SynchronousQueue$TransferStack)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:458)
at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
at java.util.concurrent.SynchronousQueue.take(SynchronousQueue.java:925)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

"cleanPromotionBargainScheduleTask-3-exe4" daemon prio=10 tid=0x00007f948002c800 nid=0x2576 in Object.wait() [0x00007f9475fac000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at com.taobao.pamirs.schedule.taskmanager.LockObject.waitCurrentThread(LockObject.java:14)
- locked <0x00000000c60fabf0> (a java.lang.Object)
at com.taobao.pamirs.schedule.taskmanager.TBScheduleProcessorSleep.run(TBScheduleProcessorSleep.java:296)
at java.lang.Thread.run(Thread.java:745)
JNI global references: 214

Heap
PSYoungGen      total 278528K, used 253517K [0x00000000ec000000, 0x0000000100000000, 0x0000000100000000)
eden space 253952K, 99% used [0x00000000ec000000,0x00000000fb7953a0,0x00000000fb800000)
from space 24576K, 0% used [0x00000000fb800000,0x00000000fb800000,0x00000000fd000000)
to   space 38400K, 0% used [0x00000000fda80000,0x00000000fda80000,0x0000000100000000)
ParOldGen       total 655360K, used 655313K [0x00000000c4000000, 0x00000000ec000000, 0x00000000ec000000)
object space 655360K, 99% used [0x00000000c4000000,0x00000000ebff47c8,0x00000000ec000000)
PSPermGen       total 51200K, used 50805K [0x00000000bee00000, 0x00000000c2000000, 0x00000000c4000000)
object space 51200K, 99% used [0x00000000bee00000,0x00000000c1f9d6e0,0x00000000c2000000)


看最后几行,可以看到PSYoungGen ,ParOldGen,PSPermGen都已达到99%的使用率,由此可推断应用中存在线程阻塞,那么阻塞在什么地方呢,仔细查看上面的日志信息,可以看到

t redis.clients.jedis.Jedis.hexists(Jedis.java:779)

at cn.htd.common.dao.util.RedisDB.getHash(RedisDB.java:255)

at cn.htd.common.util.DictionaryUtils.getDictionaryOptList(DictionaryUtils.java:250)

这两行代码可以看到是redis获取数据时出了问题,那么是不是redis有问题呢,后证实确实是redis出了问题。

我们首选会推测redis服务是不是挂了,赶紧通过本地连线上redis,发现可以连通,也可以正常查看key信息。

第二步 检查redis服务

1 先看redis服务器负载,发现CPU居高不下

2 连上redis服务,通过keys * 查看存储情况,发现多出来7w+的促销券的key,且还在增加

由此,推断出此时由于redis在执行大量的插入操作,而redis是单线程的,我们出异常的应用刚好有个频繁的定时任务要读redis,reids服务器正忙,定时任务执行缓慢,不断堆积,从而挤爆了内存空间,导致应用响应缓慢。

解决方案:

1 临时方案:kill促销发券进程,重启出问题的应用

2 迁移促销发券到单独的redis
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  web服务器 异常