网络丢包问题处理
2015-01-18 20:19
106 查看
最近测试过程中发现数据库中间件程序会出现网络丢包。具体测试工具为mysqlslap。
发现执行过程中当并发数达到一定程度时,有一定概率会出现mysqlslap一直hold住,无法返回。
测试语句为:
[root@db_slave1 cwinfocenter]# mysqlslap
--concurrency=300,300,300,400,500 --number-of-queries=6000
--iterations=1
--create-schema=chinaweather_infocenter -h172.16.80.71 -P3307
-uroot -p111111 --query=test4.sql
Benchmark
Average
number of seconds to run all queries: 2.613 seconds
Minimum
number of seconds to run all queries: 2.613 seconds
Maximum
number of seconds to run all queries: 2.613 seconds
Number of
clients running queries: 300
Average
number of queries per client: 20
Benchmark
Average
number of seconds to run all queries: 2.677 seconds
Minimum
number of seconds to run all queries: 2.677 seconds
Maximum
number of seconds to run all queries: 2.677 seconds
Number of
clients running queries: 300
Average
number of queries per client: 20
Benchmark
Average
number of seconds to run all queries: 2.689 seconds
Minimum
number of seconds to run all queries: 2.689 seconds
Maximum
number of seconds to run all queries: 2.689 seconds
Number of
clients running queries: 300
Average
number of queries per client: 20
Benchmark
Average
number of seconds to run all queries: 2.906 seconds
Minimum
number of seconds to run all queries: 2.906 seconds
Maximum
number of seconds to run all queries: 2.906 seconds
Number of
clients running queries: 400
Average
number of queries per client: 15
并发到500的时候mysqlslap一直不返回。
[root@db_slave1 cwinfocenter]# ps -eLf | grep mysqldslap
>/tmp/ps-slap
发现有大约93个线程没有返回,使用pstack跟踪未返回线程:
[root@db_slave1 cwinfocenter]# pstack 23085
Thread 1 (process 23085):
#0 0x0000003259e0e54d in read () from
/lib64/libpthread.so.0
#1 0x000000000042a002 in vio_read_buff ()
#2 0x000000000041a659 in my_real_read(st_net*,
unsigned long*) ()
#3 0x000000000041aa34 in my_net_read ()
#4 0x000000000041498a in cli_safe_read ()
#5 0x0000000000416938 in mysql_real_connect
()
#6 0x0000000000408a0d in slap_connect ()
#7 0x000000000040c5b6 in run_task ()
#8 0x0000003259e07851 in start_thread () from
/lib64/libpthread.so.0
#9 0x0000003259ae767d in clone () from
/lib64/libc.so.6
发现mysqlslap的现场是hold在connect上了,那就是连接包丢失了。
修改中间件程序的操作系统配置,调高句柄数和backlog:
ulimit -n 10240
echo 20480 > /proc/sys/net/ipv4/tcp_max_syn_backlog
再测发现还是有问题。。。
google之后发现,还有一个参数需要调整
echo 20480 > /proc/sys/net/core/somaxconn
具体原因(摘抄自网上):
The behavior of the backlog argument on TCP sockets changed
with Linux 2.2. Now it specifies the queue length for completely
established sockets waiting to be accepted, instead of the number
of incomplete connection requests.
上面这句要注意,现在他指的是已连接但未进行accept
处理的套接字,而不是syn的套接字,我一般设成64左右。所以现在关注的可能是
/proc/sys/net/core/somaxconn这个参数,而非tcp_,ax_sync_backlog,这个参数对一些防火墙应该有用(半syn攻击)
The maximum length of the queue for incomplete sockets can be
set using /proc/sys/net/ipv4/tcp_max_syn_backlog. When syncookies
are enabled there is no logical maximum length and this setting is
ignored. Seetcp(7) for more information.
If the backlog argument is greater than the value in
/proc/sys/net/core/somaxconn, then it is silently truncated to that
value; the default value in this file is 128. In kernels before
2.4.25, this limit was a hard coded value, SOMAXCONN, with
the value 128.
修改somaxconn之后,测试就不会出现丢包了。
转载请注明转自高孝鑫的博客
发现执行过程中当并发数达到一定程度时,有一定概率会出现mysqlslap一直hold住,无法返回。
测试语句为:
[root@db_slave1 cwinfocenter]# mysqlslap
--concurrency=300,300,300,400,500 --number-of-queries=6000
--iterations=1
--create-schema=chinaweather_infocenter -h172.16.80.71 -P3307
-uroot -p111111 --query=test4.sql
Benchmark
Average
number of seconds to run all queries: 2.613 seconds
Minimum
number of seconds to run all queries: 2.613 seconds
Maximum
number of seconds to run all queries: 2.613 seconds
Number of
clients running queries: 300
Average
number of queries per client: 20
Benchmark
Average
number of seconds to run all queries: 2.677 seconds
Minimum
number of seconds to run all queries: 2.677 seconds
Maximum
number of seconds to run all queries: 2.677 seconds
Number of
clients running queries: 300
Average
number of queries per client: 20
Benchmark
Average
number of seconds to run all queries: 2.689 seconds
Minimum
number of seconds to run all queries: 2.689 seconds
Maximum
number of seconds to run all queries: 2.689 seconds
Number of
clients running queries: 300
Average
number of queries per client: 20
Benchmark
Average
number of seconds to run all queries: 2.906 seconds
Minimum
number of seconds to run all queries: 2.906 seconds
Maximum
number of seconds to run all queries: 2.906 seconds
Number of
clients running queries: 400
Average
number of queries per client: 15
并发到500的时候mysqlslap一直不返回。
[root@db_slave1 cwinfocenter]# ps -eLf | grep mysqldslap
>/tmp/ps-slap
发现有大约93个线程没有返回,使用pstack跟踪未返回线程:
[root@db_slave1 cwinfocenter]# pstack 23085
Thread 1 (process 23085):
#0 0x0000003259e0e54d in read () from
/lib64/libpthread.so.0
#1 0x000000000042a002 in vio_read_buff ()
#2 0x000000000041a659 in my_real_read(st_net*,
unsigned long*) ()
#3 0x000000000041aa34 in my_net_read ()
#4 0x000000000041498a in cli_safe_read ()
#5 0x0000000000416938 in mysql_real_connect
()
#6 0x0000000000408a0d in slap_connect ()
#7 0x000000000040c5b6 in run_task ()
#8 0x0000003259e07851 in start_thread () from
/lib64/libpthread.so.0
#9 0x0000003259ae767d in clone () from
/lib64/libc.so.6
发现mysqlslap的现场是hold在connect上了,那就是连接包丢失了。
修改中间件程序的操作系统配置,调高句柄数和backlog:
ulimit -n 10240
echo 20480 > /proc/sys/net/ipv4/tcp_max_syn_backlog
再测发现还是有问题。。。
google之后发现,还有一个参数需要调整
echo 20480 > /proc/sys/net/core/somaxconn
具体原因(摘抄自网上):
The behavior of the backlog argument on TCP sockets changed
with Linux 2.2. Now it specifies the queue length for completely
established sockets waiting to be accepted, instead of the number
of incomplete connection requests.
上面这句要注意,现在他指的是已连接但未进行accept
处理的套接字,而不是syn的套接字,我一般设成64左右。所以现在关注的可能是
/proc/sys/net/core/somaxconn这个参数,而非tcp_,ax_sync_backlog,这个参数对一些防火墙应该有用(半syn攻击)
The maximum length of the queue for incomplete sockets can be
set using /proc/sys/net/ipv4/tcp_max_syn_backlog. When syncookies
are enabled there is no logical maximum length and this setting is
ignored. Seetcp(7) for more information.
If the backlog argument is greater than the value in
/proc/sys/net/core/somaxconn, then it is silently truncated to that
value; the default value in this file is 128. In kernels before
2.4.25, this limit was a hard coded value, SOMAXCONN, with
the value 128.
修改somaxconn之后,测试就不会出现丢包了。
转载请注明转自高孝鑫的博客
相关文章推荐
- 网络丢包现象分析处理指导书7(出处:www.ipdata.cn)
- Google网站管理员工具中网络抓取信息提示“未找到”的问题处理?
- 2012长春网络赛 B题(贪心+multiset处理二维问题)
- 套接字连接已中止。这可能是由于处理消息时出错或远程主机超过接收超时或者潜在的网络资源有关问题导致的。本地套接字超时是“00:04:59.9990000”
- Android 异步获取网络图片并处理导致内存溢出问题解决方法
- 创新谈-如何处理DataGuard环境中万一网络失败将导致的Primary库短时间内无法正常工作的问题-张乐奕
- Android 异步获取网络图片并处理导致内存溢出问题解决方法
- 如何处理网络游戏网络延迟问题
- 网络服务器常见问题及处理
- OC中网络传输的时候对字节流的操作 & 对字节的处理 & 野指针问题
- 一次网络严重丢包的故障处理过程 推荐
- linux网络问题-子网掩码与网关不在同一段的处理
- 网络丢包问题
- 套接字连接已中止。这可能是由于处理消息时出错或远程主机超过接收超时或者潜在的网络资源问题导致的。本地套接字超时是“00:01:00”
- ***套接字连接已中止。这可能是由于处理消息时出错或远程主机超过接收超时或者潜在的网络资源问题导致的
- 一次比较特别网络连通问题的处理
- Symbian—如何处理网络连接超时的问题?
- WCF 套接字连接已中止。这可能是由于处理消息时出错或远程主机超过接收超时或者潜在的网络资源问题导致的
- 【网络编程小Tip】linux recvfrom延迟问题,导致处理时出现错误包
- Android 异步获取网络图片并处理导致内存溢出问题解决方法