您的位置:首页 > 数据库 > SQL

MySQL: 诡异的MySQL server has gone away及其解决

2009-04-25 00:31 549 查看
在Mysql执行show status,通常更关注缓存效果、进程数等,往往忽略了两个值:

Variable_name Value
Aborted_clients3792
Aborted_connects376
通常只占query的0.0x%,所以并不为人所重视。而且在传统Web应用上,query错误对用户而言影响并不大,只是重新刷新一下页面就OK了。最近的基础改造中,把很多应用作为service运行,无法提示用户重新刷新,这种情况下,可能就会影响到服务的品质。

通过程序脚本的日志跟踪,主要报错信息为“MySQL server has gone away”。官方的解释是:

The most common reason for the
MySQL server has gone away
error
is that the server timed out and closed the connection.

Some other common reasons for the
MySQL server has gone away

error are:

You (or the db administrator) has killed the running thread with a
KILL
statement or a mysqladmin kill
command.

You tried to run a query after closing the connection to the server. This
indicates a logic error in the application that should be corrected.

A client application running on a different host does not have the necessary
privileges to connect to the MySQL server from that host.

You got a timeout from the TCP/IP connection on the client side. This may
happen if you have been using the commands:
mysql_options(...,
MYSQL_OPT_READ_TIMEOUT,...)
or
mysql_options(...,
MYSQL_OPT_WRITE_TIMEOUT,...)
. In this case increasing the timeout may
help solve the problem.

You have encountered a timeout on the server side and the automatic
reconnection in the client is disabled (the
reconnect
flag in the
MYSQL
structure is equal to 0).

You are using a Windows client and the server had dropped the connection
(probably because
wait_timeout
expired) before the command was
issued.

The problem on Windows is that in some cases MySQL doesn't get an error from
the OS when writing to the TCP/IP connection to the server, but instead gets the
error when trying to read the answer from the connection.

In this case, even if the
reconnect
flag in the
MYSQL
structure is equal to 1, MySQL does not automatically
reconnect and re-issue the query as it doesn't know if the server did get the
original query or not.

The solution to this is to either do a
mysql_ping
on the
connection if there has been a long time since the last query (this is what
MyODBC
does) or set
wait_timeout
on the
mysqld server so high that it in practice never
times out.

You can also get these errors if you send a query to the server that is
incorrect or too large. If mysqld receives a
packet that is too large or out of order, it assumes that something has gone
wrong with the client and closes the connection. If you need big queries (for
example, if you are working with big
BLOB
columns), you can
increase the query limit by setting the server's
max_allowed_packet

variable, which has a default value of 1MB. You may also need to increase the
maximum packet size on the client end. More information on setting the packet
size is given in Section
A.1.2.9, “
Packet too large
”.

An
INSERT
or
REPLACE
statement that inserts a great
many rows can also cause these sorts of errors. Either one of these statements
sends a single request to the server irrespective of the number of rows to be
inserted; thus, you can often avoid the error by reducing the number of rows
sent per
INSERT
or
REPLACE
.

You also get a lost connection if you are sending a packet 16MB or larger if
your client is older than 4.0.8 and your server is 4.0.8 and above, or the other
way around.

It is also possible to see this error if hostname lookups fail (for example,
if the DNS server on which your server or network relies goes down). This is
because MySQL is dependent on the host system for name resolution, but has no
way of knowing whether it is working — from MySQL's point of view the problem is
indistinguishable from any other network timeout.

You may also see the
MySQL server has gone away
error if MySQL
is started with the
--skip-networking
option.

Another networking issue that can cause this error occurs if the MySQL port
(default 3306) is blocked by your firewall, thus preventing any connections at
all to the MySQL server.

You can also encounter this error with applications that fork child
processes, all of which try to use the same connection to the MySQL server. This
can be avoided by using a separate connection for each child process.

You have encountered a bug where the server died while executing the query.

据此分析,可能原因有3:

1,Mysql服务端与客户端版本不匹配。

2,Mysql服务端配置有缺陷或者优化不足

3,需要改进程序脚本

通过更换多个服务端与客户端版本,发现只能部分减少报错,并不能完全解决。排除1。

对服务端进行了彻底的优化,也未能达到理想效果。在timeout的取值设置上,从经验值的10,到PHP默认的60,进行了多次尝试。而Mysql官方默认值(8小时)明显是不可能的。从而对2也进行了排除。(更多优化的经验分享,将在以后整理提供)

针对3对程序代码进行分析,发现程序中大量应用了类似如下的代码(为便于理解,用原始api描述):

$conn=mysql_connect( ... ... );

... ... ... ...

if(!$conn){ //reconnect

$conn=mysql_connect( ... ... );

}

mysql_query($sql, $conn);

这段代码的含义,与Mysql官方建议的方法思路相符[ If you have a script, you just have to issue the
query again for the client to do an automatic reconnection.
]。在实际分析中发现,if(!$conn)并不是可靠的,程序通过了if(!$conn)的检验后,仍然会返回上述错误。

对程序进行了改写:

if(!conn){ // connect ...}

elseif(!mysql_ping($conn)){ // reconnect ... }

mysql_query($sql, $conn);

经实际观测,MySQL server has gone away的报错基本解决。

BTW: 附带一个关于 reconnect 的疑问,

在php4x+client3x+mysql4x的旧环境下,reconnet的代码:

$conn=mysql_connect(...) 可以正常工作。

但是,在php5x+client4x+mysql4x的新环境下,$conn=mysql_connect(...)返回的$conn有部分情况下不可用。需要书写为:

mysql_close($conn);

$conn=mysql_connect(...);

返回的$conn才可以正常使用。原因未明。未做深入研究,也未见相关讨论。或许mysql官方的BUG汇报中会有吧。

~~呵呵~~
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: