您的位置:首页 > 其它

一次关于DNS服务器的故障排错记录——RNDC故障

2013-11-05 15:34 411 查看
说明:这是一篇对DNS排错的文章,因为在网上(包括RedHat知识库)几乎没有对文中提到的错误进行直接描述和提出最好最快的解决方案的报告,经过长达近一个小时的排错和资料查阅才有了这篇文章的脱稿。昨天我刚刚在非生产环境中的Red Hat Enterprise Linux Server上配置了一台DNS服务器,以做测试使用。但是很快遇到了一个奇怪的错误。我在执行“service named status”后,其中第一行显示如下内容:
[root@localhost ~]# service named status

rndc: connect failed: 127.0.0.1#953: connection refused

named (pid 6207) is running...

[root@localhost ~]#

一般大家都知道,rndc 主要是用来控制named进程及其配置文件的,可以用来连接DNS服务器并对配置进行重新载入,其端口号就是953。那么导致这个错误的原因可能是什么呢?我的解决思路:首先,发现问题,仔细阅读查看命令的回显信息。例如我详细的查看service的状态信息。
[root@localhost gdd]# service --status-all

abrtd (pid 2371) is running...

abrt-dump-oops (pid 2379) is running...

acpid (pid 2111) is running...

atd (pid 5396) is running...

auditd (pid 1833) is running...

automount (pid 2195) is running...

avahi-daemon (pid 2016) is running...

Usage: /etc/init.d/bluetooth {start|stop}

certmonger is stopped

Stopped

cgred is stopped

Frequency scaling enabled using ondemand governor

crond (pid 2423) is running...

cupsd (pid 2086) is running...

dnsmasq is stopped

dovecot is stopped

Usage: /etc/init.d/firstboot {start|stop}

hald (pid 2120) is running...

I don't know of any running hsqldb server.

httpd (pid 6595) is running...

Table: filter

Chain INPUT (policy ACCEPT)

num target prot opt source destination

1 ACCEPT all ::/0 ::/0 state RELATED,ESTABLISHED

2 ACCEPT icmpv6 ::/0 ::/0

3 ACCEPT all ::/0 ::/0

4 ACCEPT tcp ::/0 ::/0 state NEW tcp dpt:22

5 REJECT all ::/0 ::/0 reject-with icmp6-adm-prohibited

Chain FORWARD (policy ACCEPT)

num target prot opt source destination

1 REJECT all ::/0 ::/0 reject-with icmp6-adm-prohibited

Chain OUTPUT (policy ACCEPT)

num target prot opt source destination

IPsec stopped

Table: filter

Chain INPUT (policy ACCEPT)

num target prot opt source destination

1 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED

2 ACCEPT icmp -- 0.0.0.0/0 0.0.0.0/0

3 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0

4 ACCEPT tcp -- 10.0.0.0/8 0.0.0.0/0 tcp dpt:953

5 ACCEPT tcp -- 10.0.0.0/8 0.0.0.0/0 tcp dpt:53

6 ACCEPT tcp -- 10.0.0.0/8 0.0.0.0/0 tcp dpt:443

7 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:22

8 REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited

Chain FORWARD (policy ACCEPT)

num target prot opt source destination

1 REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited

Chain OUTPUT (policy ACCEPT)

num target prot opt source destination

Table: mangle

Chain PREROUTING (policy ACCEPT)

num target prot opt source destination

Chain INPUT (policy ACCEPT)

num target prot opt source destination

Chain FORWARD (policy ACCEPT)

num target prot opt source destination

Chain OUTPUT (policy ACCEPT)

num target prot opt source destination

Chain POSTROUTING (policy ACCEPT)

num target prot opt source destination

Table: nat

Chain PREROUTING (policy ACCEPT)

num target prot opt source destination

Chain POSTROUTING (policy ACCEPT)

num target prot opt source destination

Chain OUTPUT (policy ACCEPT)

num target prot opt source destination

irqbalance (pid 1895) is running...

Kdump is operational

started

qpidd is stopped

matahari-qmf-hostd is stopped

matahari-qmf-networkd is stopped

matahari-qmf-serviced is stopped

matahari-qmf-sysconfigd is stopped

Checking for mcelog

mcelog is stopped

mdmonitor is stopped

messagebus (pid 1993) is running...

mysqld is stopped

rndc: connect failed: 127.0.0.1#953: connection refused

named is stopped

No open transaction

netconsole module not loaded

Configured devices:

lo eth0

Currently active devices:

lo eth0

NetworkManager (pid 2004) is running...

rpc.svcgssd is stopped

rpc.mountd is stopped

nfsd is stopped

rpc.rquotad is stopped

rpc.statd (pid 2037) is running...

nmbd is stopped

ntpd (pid 2243) is running...

oddjobd is stopped

portreserve (pid 1851) is running...

master (pid 2347) is running...

postmaster is stopped

Process accounting is disabled.

qpidd (pid 2390) is running...

quota_nld is stopped

rdisc is stopped

restorecond (pid 10836) is running...

rhnsd (pid 2445) is running...

rhsmcertd (pid 2457 2456) is running...

rngd is stopped

rpcbind (pid 1909) is running...

rpc.gssd is stopped

rpc.idmapd (pid 2076) is running...

rpc.svcgssd is stopped

rsyslogd (pid 1858) is running...

sandbox is stopped

saslauthd is stopped

sfcb is not running, but pid file exists

smartd is stopped

smbd is stopped

snmpd is stopped

snmptrapd is stopped

spamd is stopped

spice-vdagentd is stopped

openssh-daemon (pid 2233) is running...

sssd is stopped

CIM server (2470) is runningtomcat6 is stopped [ OK ]

vsftpd is stopped

wdaemon is stopped

Webmin (pid 2498) is running

wpa_supplicant (pid 2020) is running...

ypbind is stopped

很显然,上面的显示中的第97行显示的
rndc: connect failed: 127.0.0.1#953: connection refused

named is stopped

是错误的信息。然后我开始查看系统日志,显示结果如下:
[root@localhost ~]# named -g

28-Mar-2012 13:27:58.722 starting BIND 9.7.3-P3-RedHat-9.7.3-8.P3.el6_2.2 -g

28-Mar-2012 13:27:58.722 built with '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--target=x86_64-redhat-linux-gnu' '--program-prefix=' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--with-libtool' '--localstatedir=/var' '--enable-threads' '--enable-ipv6' '--with-pic' '--disable-static' '--disable-openssl-version-check' '--with-dlz-ldap=yes' '--with-dlz-postgres=yes' '--with-dlz-mysql=yes' '--with-dlz-filesystem=yes' '--with-gssapi=yes' '--disable-isc-spnego' '--with-docbook-xsl=/usr/share/sgml/docbook/xsl-stylesheets' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'target_alias=x86_64-redhat-linux-gnu' 'CFLAGS= -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic' 'CPPFLAGS= -DDIG_SIGCHASE'

28-Mar-2012 13:27:58.722 adjusted limit on open files from 1024 to 1048576

28-Mar-2012 13:27:58.722 found 2 CPUs, using 2 worker threads

28-Mar-2012 13:27:58.723 using up to 4096 sockets

28-Mar-2012 13:27:58.734 loading configuration from '/etc/named.conf'

28-Mar-2012 13:27:58.735 reading built-in trusted keys from file '/etc/named.iscdlv.key'

28-Mar-2012 13:27:58.736 using default UDP/IPv4 port range: [1024, 65535]

28-Mar-2012 13:27:58.737 using default UDP/IPv6 port range: [1024, 65535]

28-Mar-2012 13:27:58.740 listening on IPv4 interface lo, 127.0.0.1#53

28-Mar-2012 13:27:58.744 binding TCP socket: address in use

28-Mar-2012 13:27:58.744 listening on IPv6 interface lo, ::1#53

28-Mar-2012 13:27:58.745 binding TCP socket: address in use

28-Mar-2012 13:27:58.747 could not open file '/var/run/named/named.pid': Permission denied

28-Mar-2012 13:27:58.747 generating session key for dynamic DNS

28-Mar-2012 13:27:58.747 could not open file '/var/run/named/session.key': Permission denied

28-Mar-2012 13:27:58.747 could not create /var/run/named/session.key

28-Mar-2012 13:27:58.747 failed to generate session key for dynamic DNS: permission denied

28-Mar-2012 13:27:58.753 using built-in trusted-keys for view _default

28-Mar-2012 13:27:58.754 set up managed keys zone for view _default, file 'dynamic/managed-keys.bind'

28-Mar-2012 13:27:58.754 automatic empty zone: 127.IN-ADDR.ARPA

28-Mar-2012 13:27:58.754 automatic empty zone: 254.169.IN-ADDR.ARPA

28-Mar-2012 13:27:58.754 automatic empty zone: 2.0.192.IN-ADDR.ARPA

28-Mar-2012 13:27:58.754 automatic empty zone: 100.51.198.IN-ADDR.ARPA

28-Mar-2012 13:27:58.754 automatic empty zone: 113.0.203.IN-ADDR.ARPA

28-Mar-2012 13:27:58.754 automatic empty zone: 255.255.255.255.IN-ADDR.ARPA

28-Mar-2012 13:27:58.754 automatic empty zone: 0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.IP6.ARPA

28-Mar-2012 13:27:58.754 automatic empty zone: D.F.IP6.ARPA

28-Mar-2012 13:27:58.754 automatic empty zone: 8.E.F.IP6.ARPA

28-Mar-2012 13:27:58.754 automatic empty zone: 9.E.F.IP6.ARPA

28-Mar-2012 13:27:58.754 automatic empty zone: A.E.F.IP6.ARPA

28-Mar-2012 13:27:58.754 automatic empty zone: B.E.F.IP6.ARPA

28-Mar-2012 13:27:58.755 automatic empty zone: 8.B.D.0.1.0.0.2.IP6.ARPA

28-Mar-2012 13:27:58.759 none:0: open: /etc/rndc.key: file not found

28-Mar-2012 13:27:58.760 couldn't add command channel 127.0.0.1#953: file not found

28-Mar-2012 13:27:58.760 none:0: open: /etc/rndc.key: file not found

28-Mar-2012 13:27:58.760 couldn't add command channel ::1#953: file not found

28-Mar-2012 13:27:58.760 ignoring config file logging statement due to -g option

28-Mar-2012 13:27:58.761 zone 0.in-addr.arpa/IN: loaded serial 0

28-Mar-2012 13:27:58.762 zone 1.0.0.127.in-addr.arpa/IN: loaded serial 0

28-Mar-2012 13:27:58.764 zone 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa/IN: loaded serial 0

28-Mar-2012 13:27:58.765 zone localhost.localdomain/IN: loaded serial 0

28-Mar-2012 13:27:58.766 zone localhost/IN: loaded serial 0

28-Mar-2012 13:27:58.766 managed-keys-zone ./IN: loading from master file dynamic/managed-keys.bind failed: permission denied

28-Mar-2012 13:27:58.766 dynamic/managed-keys.bind.jnl: open: permission denied

28-Mar-2012 13:27:58.766 managed-keys-zone ./IN: journal rollforward failed: unexpected error

28-Mar-2012 13:27:58.767 running

很明显,根据上面的结果第35,37,46行的提示很可能是权限或者配置文件的错误造成的。所以下面一一检查即可。首先不是权限的问题。我查看了所有DNS相关的所有配置文件,展示如下,也为大家以后出错作为参考。因为使用root登录终端对文件或目录执行移动或创建工作很容易导致权限问题。
[root@localhost ~]# ls /var/named/ -al

total 40

drwxr-x---. 6 root named 4096 Mar 28 13:05 .

drwxr-xr-x. 28 root root 4096 Mar 28 13:44 ..

drwxr-x---. 6 root named 4096 Mar 28 13:05 chroot

drwxrwx---. 2 named named 4096 Mar 28 13:23 data

drwxrwx---. 2 named named 4096 Mar 28 15:24 dynamic

-rw-r-----. 1 root named 1892 Feb 18 2008 named.ca

-rw-r-----. 1 root named 152 Dec 15 2009 named.empty

-rw-r-----. 1 root named 152 Jun 21 2007 named.localhost

-rw-r-----. 1 root named 168 Dec 15 2009 named.loopback

drwxrwx---. 2 named named 4096 Dec 20 23:53 slaves

[root@localhost ~]# ls /var/named/chroot/ -al

total 24

drwxr-x---. 6 root named 4096 Mar 28 13:05 .

drwxr-x---. 6 root named 4096 Mar 28 13:05 ..

drwxr-x---. 2 root named 4096 Mar 28 13:05 dev

drwxr-x---. 4 root named 4096 Mar 28 14:32 etc

drwxr-xr-x. 3 root root 4096 Mar 28 13:05 usr

drwxr-x---. 6 root named 4096 Mar 28 13:05 var

[root@localhost ~]# ls /var/named/chroot/etc/ -al

total 40

drwxr-x---. 4 root named 4096 Mar 28 14:32 .

drwxr-x---. 6 root named 4096 Mar 28 13:05 ..

-rw-r--r--. 1 root root 405 Oct 19 22:00 localtime

drwxr-x---. 2 root named 4096 Dec 20 23:53 named

-rw-r-----. 1 root named 1259 Mar 28 14:31 named.conf

-rw-r--r--. 1 root named 2544 Dec 20 23:53 named.iscdlv.key

-rw-r-----. 1 root named 931 Jun 21 2007 named.rfc1912.zones

-rw-r--r--. 1 root named 487 Dec 20 23:53 named.root.key

drwxr-xr-x. 3 root root 4096 Mar 28 13:05 pki

-rw-------. 1 root root 479 Mar 27 23:46 rndc.conf

[root@localhost ~]# ls /var/named/chroot/var -al

total 24

drwxr-x---. 6 root named 4096 Mar 28 13:05 .

drwxr-x---. 6 root named 4096 Mar 28 13:05 ..

drwxrwx---. 2 named named 4096 Dec 20 23:53 log

drwxr-x---. 6 root named 4096 Mar 28 13:05 named

drwxr-x---. 3 root named 4096 Mar 28 13:05 run

drwxrwx---. 2 named named 4096 Dec 20 23:53 tmp

[root@localhost ~]# ls /etc/named* -al

-rw-r-----. 1 root named 1259 Mar 28 14:31 /etc/named.conf

-rw-r-----. 1 root root 930 Mar 28 13:41 /etc/named.conf.backup

-rw-r--r--. 1 root named 2544 Dec 20 23:53 /etc/named.iscdlv.key

-rw-r-----. 1 root named 931 Jun 21 2007 /etc/named.rfc1912.zones

-rw-r--r--. 1 root named 487 Dec 20 23:53 /etc/named.root.key

/etc/named:

total 16

drwxr-x---. 2 root named 4096 Dec 20 23:53 .

drwxr-xr-x. 131 root root 12288 Mar 28 14:32 ..

[root@localhost ~]# ls /etc/rndc.* -al

-rw-------. 1 root root 479 Mar 27 23:46 /etc/rndc.conf

-rw-------. 1 root root 479 Mar 28 13:42 /etc/rndc.conf.backup

-rw-------. 1 root root 479 Mar 27 23:10 /etc/rndc.conf.original

-rw-------. 1 root root 479 Mar 27 23:46 /etc/rndc.conf.original_1_error_secret

-rw-------. 1 root root 510 Mar 27 23:43 /etc/rndc.key.removed_no_need

-rw-------. 1 root root 511 Mar 27 23:50 /etc/rndc.key.removed_no_need_1

[root@localhost ~]#

通过比对之前的备份,发现在权限上没有问题。PS:如果大家遇到这方面的问题请使用如下的命令进行修改。
su -

chown -R root:named /derectory/directory/file

那么既然不是权限的问题,是不是iptables给设定的规则不正确呢?查看iptables配置信息,显示如下:
[root@localhost ~]# service iptables status

Table: nat

Chain PREROUTING (policy ACCEPT)

num target prot opt source destination

Chain POSTROUTING (policy ACCEPT)

num target prot opt source destination

Chain OUTPUT (policy ACCEPT)

num target prot opt source destination

Table: mangle

Chain PREROUTING (policy ACCEPT)

num target prot opt source destination

Chain INPUT (policy ACCEPT)

num target prot opt source destination

Chain FORWARD (policy ACCEPT)

num target prot opt source destination

Chain OUTPUT (policy ACCEPT)

num target prot opt source destination

Chain POSTROUTING (policy ACCEPT)

num target prot opt source destination

Table: filter

Chain INPUT (policy ACCEPT)

num target prot opt source destination

1 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED

2 ACCEPT icmp -- 0.0.0.0/0 0.0.0.0/0

3 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0

4 ACCEPT tcp -- 10.0.0.0/8 0.0.0.0/0 tcp dpt:953

5 ACCEPT tcp -- 10.0.0.0/8 0.0.0.0/0 tcp dpt:53

6 ACCEPT tcp -- 10.0.0.0/8 0.0.0.0/0 tcp dpt:443

7 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:22

8 REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited

Chain FORWARD (policy ACCEPT)

num target prot opt source destination

1 REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited

Chain OUTPUT (policy ACCEPT)

num target prot opt source destination

[root@localhost ~]#

显然,不是iptables的配置有问题。再者,iptables如果有策略在阻止访问,其错误信息也不是如上面所示。最终我诊断为可能是/etc/named.conf 配置文件存在问题。因此进行检查配置文件,操作和显示如下:
[root@localhost ~]# named-checkconf /etc/named.conf

[root@localhost ~]# named-checkconf -t /var/named/chroot/

[root@localhost ~]#

说明,在参数上没有问题。因此我开始怀疑,是不是/etc/named.conf或者/etc/rndc.conf存在配置错误?但是,作为新配置安装的DNS不会在密钥上出现问题,因此我检查了/etc/named.conf,确实没发现什么错误。然后我检查了/etc/rndc.conf这个文件,终于发现问题的所在。结果如下:
[root@localhost ~]# cat /etc/rndc.conf

# Start of rndc.conf

key "rndc-key" {

algorithm hmac-md5;

secret "cK1Bt77B8kL9uLpxy4GDTg==";

};

options {

default-key "rndc-key";

default-server 127.0.0.1;

default-port 953;

};

# End of rndc.conf

# Use with the following in named.conf, adjusting the allow list as needed:

# key "rndc-key" {

# algorithm hmac-md5;

# secret "cK1Bt77B8kL9uLpxy4GDTg==";

# };

#

# controls {

# inet 127.0.0.1 port 953

# allow { 127.0.0.1; } keys { "rndc-key"; };

# };

# End of named.conf

显然,最后的注释说的很清楚,要想使用rndc就必须在/etc/named.conf中进行配置。所以将显示如下的/etc/named.conf第一段代码更改为第二段代码。第一段代码:
[root@localhost ~]# cat /etc/named.conf

//

// named.conf

//

// Provided by Red Hat bind package to configure the ISC BIND named(8) DNS

// server as a caching only nameserver (as a localhost DNS resolver only).

//

// See /usr/share/doc/bind*/sample/ for example named configuration files.

//

options {

listen-on port 53 { 127.0.0.1; };

listen-on-v6 port 53 { ::1; };

directory "/var/named";

dump-file "/var/named/data/cache_dump.db";

statistics-file "/var/named/data/named_stats.txt";

memstatistics-file "/var/named/data/named_mem_stats.txt";

allow-query { localhost; };

recursion yes;

dnssec-enable yes;

dnssec-validation yes;

dnssec-lookaside auto;

/* Path to ISC DLV key */

bindkeys-file "/etc/named.iscdlv.key";

};

logging {

channel default_debug {

file "data/named.run";

severity dynamic;

};

};

zone "." IN {

type hint;

file "named.ca";

};

include "/etc/named.rfc1912.zones";

第二段代码:
[root@localhost ~]# cat /etc/named.conf

//

// named.conf

//

// Provided by Red Hat bind package to configure the ISC BIND named(8) DNS

// server as a caching only nameserver (as a localhost DNS resolver only).

//

// See /usr/share/doc/bind*/sample/ for example named configuration files.

//

options {

listen-on port 53 { 127.0.0.1; };

listen-on-v6 port 53 { ::1; };

directory "/var/named";

dump-file "/var/named/data/cache_dump.db";

statistics-file "/var/named/data/named_stats.txt";

memstatistics-file "/var/named/data/named_mem_stats.txt";

allow-query { localhost; };

recursion yes;

dnssec-enable yes;

dnssec-validation yes;

dnssec-lookaside auto;

/* Path to ISC DLV key */

bindkeys-file "/etc/named.iscdlv.key";

};

logging {

channel default_debug {

file "data/named.run";

severity dynamic;

};

};

zone "." IN {

type hint;

file "named.ca";

};

include "/etc/named.rfc1912.zones";

# Add line to enable named working with "/etc/rndc.conf"

# Use with the following in named.conf, adjusting the allow list as needed:

key "rndc-key" {

algorithm hmac-md5;

secret "cK1Bt77B8kL9uLpxy4GDTg==";

};

controls {

inet 127.0.0.1 port 953

allow { 127.0.0.1; } keys { "rndc-key"; };

};

# End of named.conf

[root@localhost ~]#

最后,重新启动named守护进程
su -

service named restart

service named status

结果显示如下,就表示可以了。
[root@localhost ~]# service named status

version: 9.7.3-P3-RedHat-9.7.3-8.P3.el6_2.2

CPUs found: 2

worker threads: 2

number of zones: 19

debug level: 0

xfers running: 0

xfers deferred: 0

soa queries in progress: 0

query logging is OFF

recursive clients: 0/0/1000

tcp clients: 0/100

server is up and running

named (pid 11918) is running...

[root@localhost ~]#
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息