您的位置:首页 > 职场人生

HP VA7400存储故障诊断,数据恢复有可能

2011-08-19 10:29 543 查看
HP VA7400存储故障诊断,数据恢复有可能

环境:VA7400

两个盘笼,每个盘笼分别14块硬盘 总共28块硬盘,分别做了两个RAID GROUP 每个RAIDGOURP是AUTORAID(RAID 0+1)

其中无法读取数据的VG(一读到这个VG里LV里的某些固定的文件的时候,主机HUANG住,存储不停的在扫描硬盘,硬件已经判定有不止一块有坏道的盘),该VG总共有两个LUN组成 分别在存储两个RAIDGROUP上,我们做过DD测试,当在其中一个RAIDGRUOP中用DD LUN的时候 正常, 但在另外个RAIDGROUP中DD LUN的时候 发生主机HUANG住 存储不停扫瞄硬盘(现象跟读取那个VG里的数据情况一样)

所以现在可以肯定的是,存储两个RAIDGROUP中,有一个是完全正常的,另一个RAIDGROUP有问题,而那个VG中的两个LUN,正好有一个LUN在有问题的RAIDGROUP中.

另外,这个有问题的RAIDGROUP,同时坏过两块盘(控制器报出来的)

我们需要的数据也正好在那个VG上

附件是硬件日志以及LUN信息的分布,您可以参考一下

{本文由达思总工程师覃廷良撰写,转载请注明出处(http://www.bnuol.com 达思数据恢复技术博客)}

以下截取日志片断

SUB-SYSTEM SETTINGS

RAID Level:___________________________HPAutoRAID

Auto Format Drive:____________________On

Hang Detection:_______________________On

Capacity Depletion Threshold:_________100%

Queue Full Threshold Maximum:_________4096

Enable Optimize Policy:_______________True

Enable Manual Override:_______________False

Manual Override Destination:__________False

Read Cache Disable:___________________False

Rebuild Priority:_____________________Low

Security Enabled:_____________________False

Shutdown Completion:__________________0

Subsystem Type ID:____________________1

Unit Attention:_______________________True

Volume Set Partition (VSpart):________False

Write Cache Enable:___________________True

Write Working Set Interval:___________8640

Enable Prefetch:______________________False

Disable Secondary Path Presentation:__False

Enclosure at M

Enclosure ID__________________________0

Enclosure Status______________________Failed

Enclosure Type________________________HP StorageWorks Virtual Array 7400

Node WWN______________________________50060b000014e7d6

FRU HW COMPONENT IDENTIFICATION ID STATUS

===========================================================================

M Enclosure 00SG223J0074 Failed

M/P1 Power Supply 94020HE00808 Good

M/P2 Power Supply 94020HE00717 Good

M/MP1 MidPlane 000601310041 Good

M/C2 Controller 00PR05B50445 Good

M/C2.H1 Host Port <none> Good

M/C2.J1 BackEnd Port <none> Good

M/C2.B1 Battery 40133:MOLTECHPS:NI2040:2002/7/19 Good

M/C2.PM1 Processor HP:A6189A:HP19 Good

M/C2.M1 DIMM 512 Good

M/C1 Controller Failed

M/D1 Disk 3EK1NM33 Good

M/D2 Disk 3EK0MF81 Good

M/D3 Disk 3EK1NXQ6 Good

M/D4 Disk 3HZ0G1QD Good

M/D5 Disk 3EK1NQEM Good

M/D6 Disk 3EK1NX69 Good

M/D7 Disk 3EK1NMZT Good

M/D8 Disk 3EK10AZS Good

M/D9 Disk 3KP17QL80000 Good

M/D10 Disk 3HZ92CQ9 Good

M/D11 Disk 3EK1KDSJ Good

M/D12 Disk 3HZ0MVX7 Good

M/D13 Disk 3EK24C4H Good

M/D14 Disk 3EK1NHSA Good

Enclosure at JA0

Enclosure ID__________________________0

Enclosure Status______________________Good

Enclosure Type________________________HP StorageWorks Disk System DS2405

Node WWN______________________________50060b0000195066

FRU HW COMPONENT IDENTIFICATION ID STATUS

===========================================================================

JA0 Enclosure SG22200001 Good

JA0/MP1 MidPlane SG22200001 Good

JA0/P1 Power Supply 62020FD01285 Good

JA0/P2 Power Supply 62020FD01267 Good

JA0/C2 LCC R25DK1444151 Good

JA0/C2.H1 Front Port <none> Good

JA0/D1 Disk 3EK1MCCP Good

JA0/D2 Disk 3EK01ZQN Good

JA0/D3 Disk 3EK1NJNS Good

JA0/D4 Disk 3EK1NL2T Good

JA0/D5 Disk 3EK1NFRN Good

JA0/D6 Disk 3EK1N23S Good

JA0/D7 Disk 3EK1NLZL Good

JA0/D8 Disk 3EK1NFJM Good

JA0/D9 Disk 3EK1SBD8 Good

JA0/D10 Disk 3HZY5F6L Good

JA0/D11 Disk 3EK1NVJZ Good

JA0/D12 Disk 3EK1NQ2J Good

JA0/D13 Disk 3EK1NLX5 Good

JA0/D14 Disk 3EK16N2S Good

Disk at JA0/D9:

Status:_______________________________Good

Disk State:___________________________Included

Vendor ID:____________________________HP 73.4G

Product ID:___________________________ST373405FC

Product Revision:_____________________HP09

Data Capacity:________________________66.757 GB (140000000 blocks)

Block Length:_________________________520 bytes

Address:______________________________8

Node WWN:_____________________________20000004cfa1a362

Initialize State:_____________________Ready

Redundancy Group:_____________________1

Volume Set Serial Number:_____________000027C200000003

Serial Number:________________________3EK1SBD8

Firmware Revision:____________________HP09

Recovery Maps are on this disk.

Disk at JA0/D13:

Status:_______________________________Good

Disk State:___________________________Included

Vendor ID:____________________________HP 73.4G

Product ID:___________________________ST373405FC

Product Revision:_____________________HP09

Data Capacity:________________________66.757 GB (140000000 blocks)

Block Length:_________________________520 bytes

Address:______________________________12

Node WWN:_____________________________20000004cf98f82c

Initialize State:_____________________Ready

Redundancy Group:_____________________1

Volume Set Serial Number:_____________000027C200000003

Serial Number:________________________3EK1NLX5

Firmware Revision:____________________HP09

Recovery Maps are on this disk.

初步看了日志,HP VA7400存储使用的硬盘采用520字节进行格式化,

(Block Length:_________________________520 bytes),如果要进行数据恢复,则必须把硬盘镜像出来,然后进行Raid组合。

HP VA7400,采用AutoRaid方式,然后划分出LUN,LUN空间的分配不是线性平行分配,而是由Block Map方式记录LUN空间分配地址,即便把Raid原样组合出来,还不能完全确定LUN的空间分配,要弄清楚LUN的空间分配,就得查看分析MetaData所在的硬盘,一般会有两个硬盘存放MetaData(该硬盘被标记上Recovery Maps are on this disk.),这个MetaData的存储方式,除了HP VA 系列存储设计研发人员知道,别人如果没有测试环境研究,没办法的到准确信息。

从本故障信息看,很有可能是MetaData硬盘出现了异常,导致控制器上的信息跟硬盘上的信息不一致,读取LUN时,Map信息不准确或者地址溢出,死机或者自动重启是必然的。

既然原因分析出来,就去验证这两块MetaData硬盘到底是不是良好的,从而下手数据恢复技术操作。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息