您的位置:首页 > 其它

dell服务器硬盘的状态变成外来(foreign)命令行修复

2016-11-23 18:27 288 查看
Idrac监控报错:登陆ideac卡后如下如所示:硬盘状态是红叉,状态是外来


命令行安装MegaCli
rpm -ivh MegaCli-8.07.08-1.noarch.rpm
查看包的安装路径

rpm -ql MegaCli-8.07.08-1.noarch
/opt/MegaRAID/MegaCli/MegaCli
/opt/MegaRAID/MegaCli/MegaCli64
/opt/MegaRAID/MegaCli/libstorelibir-2.so.14.07-0
执行状态检测命令:

/opt/MegaRAID/MegaCli/MegaCli64 -pdlist -aall |grep 'Firmware state'
Firmware state: Unconfigured(good), Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
此时的状态如下:
############
Enclosure Device ID: 32
Slot Number: 4
###Drive's position: DiskGroup: 0, Span: 2, Arm: 1 ###应该有这个信息的,但是这个没有
Enclosure position: 1
Device Id: 4
WWN: 5000C5005AAD3E28
Sequence Number: 1
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 102
Last Predictive Failure Event Seq Number: 41501
PD Type: SAS

Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors]
Coerced Size: 558.375 GB [0x45cc0000 Sectors]
Sector Size: 0
Firmware state: Unconfigured(good), Spun Up
Device Firmware Level: ES66
Shield Counter: 0
Successful diagnostics completion on : N/A
SAS Address(0): 0x5000c5005aad3e29
SAS Address(1): 0x0
Connected Port Number: 0(path0)
Inquiry Data: SEAGATE ST3600057SS ES666SL5A74S
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: Foreign
Foreign Secure: Drive is not secured by a foreign lock key
Device Speed: 6.0Gb/s
Link Speed: 6.0Gb/s
Media Type: Hard Disk Device
Drive Temperature :45C (113.00 F)
PI Eligibility: No
Drive is formatted for PI information: No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s
Port-1 :
Port status: Active
Port's Linkspeed: Unknown
Drive has flagged a S.M.A.R.T alert : Yes
############

执行导入命令

/opt/MegaRAID/MegaCli/MegaCli64 -CfgForeign -Import -aall
Foreign configuration is imported on controller 0.
Exit Code: 0x00
再次执行状态检测命令:
/opt/MegaRAID/MegaCli/MegaCli64 -pdlist -aall |grep 'Firmware state'
Firmware state: Rebuild
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
查询 Rebuild 进度:
/opt/MegaRAID/MegaCli/MegaCli64 -pdrbld -showprog -physdrv[32:0] -a0
显示如下:
Rebuild Progress on Device at Enclosure 32, Slot 0 Completed 38% in 54 Minutes.
Exit Code: 0x00
或者
/opt/MegaRAID/MegaCli/MegaCli64 -pdrbld -ProgDsply -physdrv[32:0] -a0
显示如下:
Rebuild progress of physical drives...

Enclosure:Slot Percent Complete Time Elps
032 :00 ####################***40 %*********************** 00:56:40

Press <ESC> key to quit...

备注:
Enclosure Device ID: 32
Slot Number: 0

以上两条信息通过/opt/MegaRAID/MegaCli/MegaCli64 -pdlist -aall |less 来查看

扫描外来配置的个数:
# /opt/MegaRAID/MegaCli/MegaCli64 -cfgforeign -scan -a0
清除外来配置:
# /opt/MegaRAID/MegaCli/MegaCli64 -cfgforeign -clear -a0
再次扫描外来配置的个数:
# /opt/MegaRAID/MegaCli/MegaCli64 -cfgforeign -scan -a0

参考:http://erikimh.com/raid-rebuilding-foreign-disk-by-hand/

Megacli是一款管理维护硬件RAID的工具,有LSI公司提供,LSI公司的raid卡,使用的比较广泛。我们可以通过megacli了解当前raid卡的所有信息,包括raid卡的型号,raid的阵列类型,raid上的磁盘状态,也可以通过它来直接创建阵列,在线添加磁盘等。一,Megacli工具安装可以在LSI公司的官网直接下载工具:http://www.lsi.com/downloads/Public/Nytro/downloads/Nytro%20XD/MegaCli_Linux.zip 下载完成之后,是一个zip包,然后解压,安装:# unzip MegaCli_Linux.zip# cd MegaCli_Linux# ls
megacli_8.07.08-1_all.deb MegaCli-8.07.08-1.noarch.rpm MegaSAS.log# rpm -ivh MegaCli-8.07.08-1.noarch.rpm安装成功之后,命令的默认安装路径为:# /opt/MegaRAID/MegaCli/MegaCli64二,查看磁盘的状态作用:显示Raid卡型号,Raid设置,整列类型,Disk相关信息# /opt/MegaRAID/MegaCli/MegaCli64 -cfgdsply -aALL|less1,查看raid整列类型和大小



如上图所示:(1)通过RAID Level字段得知,disk group 0做的是一个raid5;(2)disk group 0的大小为1.6TB;2,查看raid的cache策略

如上图所示,raid的默认以及当前生效的cache策略为writeback(还有一种cache策略为WriteThrough)
策略说明:(1). 第一段: WriteBack, WriteThrough* WriteBack:进行写操作时,将数据写入RAID卡缓存,并直接返回,RAID卡控制器将在系统负载低或者Cache满了的情况下把数据写入硬盘。该设置会大大提升RAID卡写性能,绝大多数的情况下会降低系统IO负载。 数据的可靠性由RAID卡的BBU(Battery Backup Unit)进行保证。大多数情况下,我们都使用这种策略。* WriteThrough: 数据写操作不使用缓存,数据直接写入磁盘。RAID卡写性能明显下降,在大多数情况下该设置会造成系统IO负载上升。特别对于io负载很大的服务,表现特别明显。(2). 第二段: ReadAheadNone, ReadAdaptive, ReadAhead.* ReadAheadNone: 不开启预读。这是默认的设置* ReadAhead: 在读操作时,预先把后面顺序的数据加载入Cache,在顺序读取时,能提高性能,相反会降低随机读的性能。* ReadAdaptive: 自适应预读,当Cache memory和IO空闲时,采取顺序预读,平衡了连续读性能及随机读的性能,需要消耗一定的计算能力。(3). 第三段: Direct, Cached.* Direct: Direct IO模式,读操作不缓存到cache memory中,数据将同时传输到cache中和应用,如果接下来要读取相同的数据块,则直接从Cache memory中获取. 这是默认的设置* Cached: Cached IO模式,所有读操作都会缓存到cache memory中。(4). 第四段: Write Cache OK if Bad BBU, No Write Cache if Bad BBU* Write Cache OK if Bad BBU: 在BBU有问题时(如电池失效), 依旧使用Write Cache, 有一定的数据丢失风险.* No Write Cache if Bad BBU: 在BBU有问题时, 不使用Write Cache策略自动切换的问题由于MegaSAS RAID卡默认采用No Write Cache if Bad BBU的设置,将可能发生Write Cache策略变更的情况(由WriteBack变成WriteThrough),导致写性能下降,如果该自动变更发生在业务高峰且系统Io负载高的时候,可能会引发不可预测的问题,如卡机。以下原因将造成Write Cache策略的变更.(1). RAID卡进入BBU Learn Cycle: 详细介绍见下面(2). 检测到某些电池故障,如电池容量过低等,一般是电池老化带来的影响,IBM建议一年更换一次RAID卡电池(3). 没有安装电池, 部分服务器购买时不带电池,导致被自动设置为WriteThrough3,判定磁盘是否损坏



如上图所示,我们一般通过如上5个值,来判断磁盘是否应该报修:1,Media Error磁盘存在错误,可能是磁盘有坏道。值越大,越危险。根据磁盘状况,一般大于100报修更换。2,Other Error磁盘存在未知的错误,可能是磁盘松动,需要重新再插入。根据磁盘状况,一般大于100报修更换。3,Predictive Failure Count磁盘的预警数。一般大于0,就报修更换。4,Last Predictive Failure Event Seq Number最后一条预警的时间序列号。这个值不为0,肯定Predictive Failure Count也不为05,Firmware state磁盘目前的状态。一般有9种,即(1)Unconfigured Good – A drive accessible to the RAID controller but not configured as a part ofa virtual drive or as a hot spare.(2)Online – A drive that can be accessed by the RAID controller and will be part of the virtualdrive.(3)Rebuild – A drive to which data is being written to restore full redundancy for a virtual drive.(4)Failed – A drive that was originally configured as Online or Hot Spare, but on which thefirmware detects an unrecoverable error.(5)Unconfigured Bad – A drive on which the firmware detects an unrecoverable error; the drivewas Unconfigured Good or the drive could not be initialized.(6)Missing – A drive that was Online, but which has been removed from its location.(7)Offline – A drive that is part of a virtual drive but which has invalid data as far as the RAIDconfiguration is concerned.(8)Hot Spare – A drive that is configured as a hot spare.(9)None – A drive with an unsupported flag set. An Unconfigured Good or Offline drive that hascompleted the prepare for removal operation.(10)还有一种特殊的状态copyback:从磁盘组中把数据复制到非磁盘组的磁盘中,然后等failed的盘更换之后,再从这个非磁盘组的磁盘中把数据给copyback回来。做hot spare的盘,会出现这种情况:即原来的hot spare盘只是临时存放了数据,等failed的盘更换之后,把数据从hotspare的盘中复制回来,正常使用的还是新更换的盘,hot spare的盘永久做hot spare。

1.显示Rebuid进度/opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -ShowProg -physdrv[20:2] -aALL2.查看E S/opt/MegaRAID/MegaCli/MegaCli64 -PDList -aAll -NoLog | grep -Ei "(enclosure|slot)"3.查看所有硬盘的状态/opt/MegaRAID/MegaCli/MegaCli64 -PDList -aAll -NoLog4.查看所有Virtual Disk的状态/opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -aAll -NoLogRAID Level对应关系:
RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0RAID 1
RAID Level : Primary-0, Secondary-0, RAID Level Qualifier-0RAID 0
RAID Level : Primary-5, Secondary-0, RAID Level Qualifier-3RAID 5
RAID Level : Primary-1, Secondary-3, RAID Level Qualifier-0RAID 10
5.在线做Raid/opt/MegaRAID/MegaCli/MegaCli64 -CfgLdAdd -r0[0:11] WB NORA Direct CachedBadBBU -strpsz64 -a0 -NoLog
/opt/MegaRAID/MegaCli/MegaCli64 -CfgLdAdd -r5 [12:2,12:3,12:4,12:5,12:6,12:7] WB Direct -a06.点亮指定硬盘(定位)/opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -start -physdrv[252:2] -a07.清除Foreign状态/opt/MegaRAID/MegaCli/MegaCli64 -CfgForeign -Clear -a08.查看RAID阵列中掉线的盘/opt/MegaRAID/MegaCli/MegaCli64 -pdgetmissing -a09.替换坏掉的模块/opt/MegaRAID/MegaCli/MegaCli64 -pdreplacemissing -physdrv[12:10] -Array5 -row0 -a010.手动开启rebuid/opt/MegaRAID/MegaCli/MegaCli64 -pdrbld -start -physdrv[12:10] -a011.查看Megacli的log/opt/MegaRAID/MegaCli/MegaCli64 -FwTermLog dsply -a0 > adp2.log12.设置HotSpare/opt/MegaRAID/MegaCli/MegaCli64-pdhsp -set[-Dedicated[-Array2]][-EnclAffinity][-nonRevertible]-PhysDrv[4:11]-a0
/opt/MegaRAID/MegaCli/MegaCli64-pdhsp -set[-EnclAffinity][-nonRevertible]-PhysDrv[32:1}]-a0 MegaCli -PDHSP -Set -Dedicated -Array0 -physdrv[E:S] -a0 添加局部热备盘,其中array0表示第0个raid MegaCli -pdhsp -set -physdrv[E:S] -a0 添加全局热备盘MegaCli -pdhsp -rmv -physdrv[E:S] -a0 移除全局和热备局部热备 13.关闭Rebuild/opt/MegaRAID/MegaCli/MegaCli64 -AdpAutoRbld -Dsbl -a014.设置rebuild的速率/opt/MegaRAID/MegaCli/MegaCli64 -AdpSetProp RebuildRate -30 -a0 附其他详细命令:1. 常用命令:#/opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aALL 查raid级别#/opt/MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -aALL 查raid卡信息
#/opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL 查看硬盘信息
#/opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -aAll 查看电池信息
#/opt/MegaRAID/MegaCli/MegaCli64 -FwTermLog -Dsply -aALL 查看raid卡日志
#/opt/MegaRAID/MegaCli/MegaCli64 -adpCount 【显示适配器个数】
#/opt/MegaRAID/MegaCli/MegaCli64 -AdpGetTime –aALL 【显示适配器时间】
#/opt/MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -aAll 【显示所有适配器信息】
#/opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -LALL -aAll 【显示所有逻辑磁盘组信息】
#/opt/MegaRAID/MegaCli/MegaCli64 -PDList -aAll 【显示所有的物理信息】
#/opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -GetBbuStatus -aALL |grep ‘Charger Status’ 【查看充电状态】
#/opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -GetBbuStatus -aALL【显示BBU状态信息】
#/opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -GetBbuCapacityInfo -aALL【显示BBU容量信息】
#/opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -GetBbuDesignInfo -aALL 【显示BBU设计参数】
#/opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -GetBbuProperties -aALL 【显示当前BBU属性】
#/opt/MegaRAID/MegaCli/MegaCli64 -cfgdsply -aALL 【显示Raid卡型号,Raid设置,Disk相关信息】
#/opt/MegaRAID/MegaCli/MegaCli64 -adpallinfo -aall|grep -i temp 查看温度

2. 磁带状态的变化,从拔盘,到插盘的过程中
Device |Normal|Damage|Rebuild|Normal
Virtual Drive |Optimal|Degraded|Degraded|Optimal
Physical Drive |Online|Failed –> Unconfigured|Rebuild|Online

3. 查看磁盘缓存策略
#/opt/MegaCli -LDGetProp -Cache -L0 -a0
or
#/opt/MegaCli -LDGetProp -Cache -L1 -a0
or
#/opt/MegaCli -LDGetProp -Cache -LALL -a0
or
#/opt/MegaCli -LDGetProp -Cache -LALL -aALL
or
#/opt/MegaCli -LDGetProp -DskCache -LALL -aALL

4.设置磁盘缓存策略
缓存策略解释:
WT (Write through
WB (Write back)
NORA (No read ahead)
RA (Read ahead)
ADRA (Adaptive read ahead)
Cached
Direct
例子:
#/opt/MegaCli -LDSetProp WT|WB|NORA|RA|ADRA -L0 -a0
or
#/opt/MegaCli -LDSetProp -Cached|-Direct -L0 -a0
or
enable / disable disk cache
#/opt/MegaCli -LDSetProp -EnDskCache|-DisDskCache -L0 -a0

/opt/MegaRAID/MegaCli/MegaCli64 -DiscardPreservedCache -Lall -a0 -NoLOG 【清空缓存】

5. 创建一个 raid5 阵列,由物理盘 2,3,4 构成,该阵列的热备盘是物理盘 5
#/opt/MegaCli -CfgLdAdd -r5 [1:2,1:3,1:4] WB Direct -Hsp[1:5] -a0
6. 创建阵列,不指定热备
#/opt/MegaCli -CfgLdAdd -r5 [1:2,1:3,1:4] WB Direct -a0
7. 删除阵列
#/opt/MegaCli -CfgLdDel -L1 -a0
8. 在线添加磁盘
#/opt/MegaCli -LDRecon -Start -r5 -Add -PhysDrv[1:4] -L1 -a0
9. 阵列创建完后,会有一个初始化同步块的过程,可以看看其进度。
#/opt/MegaCli -LDInit -ShowProg -LALL -aALL
或者以动态可视化文字界面显示
#/opt/MegaCli -LDInit -ProgDsply -LALL -aALL
10. 查看阵列后台初始化进度
#/opt/MegaCli -LDBI -ShowProg -LALL -aALL
或者以动态可视化文字界面显示
#/opt/MegaCli -LDBI -ProgDsply -LALL -aALL
11. 指定第 5 块盘作为全局热备
#/opt/MegaCli -PDHSP -Set [-EnclAffinity] [-nonRevertible] -PhysDrv[1:5] -a0
12. 指定为某个阵列的专用热备
#/opt/MegaCli -PDHSP -Set [-Dedicated [-Array1]] [-EnclAffinity] [-nonRevertible] -PhysDrv[1:5] -a0
13. 删除全局热备
#/opt/MegaCli -PDHSP -Rmv -PhysDrv[1:5] -a0
14. 将某块物理盘下线/上线
#/opt/MegaCli -PDOffline -PhysDrv [1:4] -a0

15. 查看物理磁盘重建进度
#/opt/MegaCli -PDRbld -ShowProg -PhysDrv [1:5] -a0
或者以动态可视化文字界面显示
#/opt/MegaCli -PDRbld -ProgDsply -PhysDrv [1:5] -a0
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  dell MegaCli reblid