您的位置:首页 > 其它

How To Validate ASM Diskgroup Consistency/State After ASM Reclamation Utility (ASRU) Execution Abort

2014-11-20 23:33 731 查看
In this Document

 Goal
 Solution
 Notes:
 
 
 Community Discussions
 References


APPLIES TO:

Oracle Database - Standard Edition - Version 10.2.0.1 to 12.1.0.1 [Release 10.2 to 12.1]
Oracle Database - Enterprise Edition - Version 10.2.0.1 to 12.1.0.1 [Release 10.2 to 12.1]
Information in this document applies to any platform.


GOAL

The present document shows in detail an example about the required tasks/validations to be executed in case ASM Reclamation Utility (ASRU) aborts (e.g. shell OS session/window is disconnected,
ASRU execution is killed by accident, etc.).


SOLUTION

1) +DATA diskgroup was created to allocate the database files, it shows 14,290 MB free:

SQL*Plus: Release 11.2.0.4.0 Production on Thu May 1 20:15:53 2014

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to:

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

With the Automatic Storage Management option

SQL> select group_number, name, state, total_mb, free_mb from v$asm_diskgroup;

GROUP_NUMBER NAME                           STATE         TOTAL_MB    FREE_MB

------------ ------------------------------ ----------- ---------- ----------

           2 DATA                           MOUNTED          14350      14290

2) These are its disks members:

SQL> select name, path, total_mb, free_mb from v$asm_disk where group_number like 2;

NAME                           PATH              TOTAL_MB    FREE_MB

------------------------------ --------------- ---------- ----------

DATA_0004                      /dev/raw/raw6         2870       2858

DATA_0003                      /dev/raw/raw5         2870       2858

DATA_0002                      /dev/raw/raw3         2870       2858

DATA_0001                      /dev/raw/raw2         2870       2858

DATA_0000                      /dev/raw/raw1         2870       2858

3) The original space allocation (in the +DATA diskgroup) before datafiles are created is as follows:

SQL> select group_number, name, state, total_mb, free_mb from v$asm_diskgroup;

GROUP_NUMBER NAME                           STATE         TOTAL_MB    FREE_MB

------------ ------------------------------ ----------- ---------- ----------

           2 DATA                           MOUNTED          14350      14290
  

SQL> select name, path, total_mb, free_mb from v$asm_disk where group_number like 2;

NAME                           PATH              TOTAL_MB    FREE_MB

------------------------------ --------------- ---------- ----------

DATA_0004                      /dev/raw/raw6         2870       2858

DATA_0003                      /dev/raw/raw5         2870       2858

DATA_0002                      /dev/raw/raw3         2870       2858

DATA_0001                      /dev/raw/raw2         2870       2858

DATA_0000                      /dev/raw/raw1         2870       2858

4) Then,
4000
the next tablespaces (6GB each) were created in the “+DATA” diskgroup:

SQL> CREATE BIGFILE TABLESPACE "6GBTS_1" DATAFILE '+DATA' SIZE 6G LOGGING EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO;

Tablespace created.

SQL> CREATE BIGFILE TABLESPACE "6GBTS_2" DATAFILE '+DATA' SIZE 6G LOGGING EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO;

Tablespace created.

5) After the tablespaces were created, the new space allocation (in the +DATA diskgroup) is as follows:

SQL> select group_number, name, state, total_mb, free_mb from v$asm_diskgroup;

GROUP_NUMBER NAME                           STATE         TOTAL_MB    FREE_MB

------------ ------------------------------ ----------- ---------- ----------

           2 DATA                           MOUNTED          14350        210

SQL> select name, path, total_mb, free_mb from v$asm_disk where group_number like 2;

NAME                           PATH                   TOTAL_MB    FREE_MB

------------------------------ -------------------- ---------- ----------

DATA_0004                      /dev/raw/raw6              2870         43

DATA_0003                      /dev/raw/raw5              2870         44

DATA_0002                      /dev/raw/raw3              2870         39

DATA_0001                      /dev/raw/raw2              2870         41

DATA_0000                      /dev/raw/raw1              2870         43

6) Then, one of the tablespaces was dropped to release 6GB of space from “+DATA” diskgroup:

SQL> drop tablespace "6GBTS_1" ;

Tablespace dropped.

7) ASM shows 6GB were released from the +DATA diskgroup:

SQL> select group_number, name, state, total_mb, free_mb from v$asm_diskgroup;

GROUP_NUMBER NAME                           STATE         TOTAL_MB    FREE_MB

------------ ------------------------------ ----------- ---------- ----------

           2 DATA                           MOUNTED          14350       6356

NAME                           PATH                   TOTAL_MB    FREE_MB

------------------------------ -------------------- ---------- ----------

DATA_0004                      /dev/raw/raw6              2870       1272

DATA_0003                      /dev/raw/raw5              2870       1273

DATA_0002                      /dev/raw/raw3              2870       1269

DATA_0001                      /dev/raw/raw2              2870       1270

DATA_0000                      /dev/raw/raw1              2870       1272

8) Next step, ASRU was executed against the “+DATA” diskgroup to reclaim the space at physical lun level:

8.1) ASRU was executed as follows (when ASRU was resizing the second of 5 disks, the “OS shell session/connection/window” was intentionally closed to simulate “ASRU” crash/disconnection/abort):

[ceintcb15]/refresh/asmsupt/home/asru> ASRU DATA

Checking the system ...done

Calculating the sizes of the disks ...done

Writing the data to a file ...done

Resizing the disks...done

Calculating the sizes of the disks ...done

/refresh/asmsupt/app/oracle/product/ASMdbA/perl/bin/perl -I /refresh/asmsupt/app/oracle/product/ASMdbA/perl/lib/5.10.0 /refresh/asmsupt/home/asru/zerofill 1 /dev/raw/raw5 1998 2870 /dev/raw/raw2 1999 2870 /dev/raw/raw1 2000 2870 /dev/raw/raw3 2001 2870 /dev/raw/raw6
2001 2870

872+0 records in

872+0 records out

914358272 bytes (914 MB) copied, 19.3352 seconds, 47.3 MB/s

871+0 records in

871+0 records out

913309696 bytes (913 MB) copied, 18.8338 seconds, 48.5 MB/s

870+0 records in

870+0 records out   <(=========   <<<<<<(Session was closed here)>>>>>>

8.2) The trace file (ASRU.trc) shows ASRU did not complete in background the storage reclamation (the ASRU process did not “dd” all the 5 disks, only 4) due to the session was closed:

Making diskgroup DATA thin provision friendly

Fri May  2 15:32:59 2014 

ASM_POWER_LIMIT is 1

Fri May  2 15:32:59 2014 

Checking the system ...

No traces from the previous execution found

Fri May  2 15:32:59 2014 

Calculating the sizes of the disks..

Fri May  2 15:32:59 2014 

Executing /* ASRU */SELECT D.NAME,D.TOTAL_MB,D.FREE_MB,G.ALLOCATION_UNIT_SIZE 

            FROM V$ASM_DISK D, 

            V$ASM_DISKGROUP G WHERE 

            D.GROUP_NUMBER = G.GROUP_NUMBER AND G.NAME='DATA'

Calculated sizes : 

DATA_0003 : total:2870 free:1273 used:1597 new:1997 

DATA_0001 : total:2870 free:1272 used:1598 new:1998 

DATA_0000 : total:2870 free:1271 used:1599 new:1999 

DATA_0002 : total:2870 free:1270 used:1600 new:2000 

DATA_0004 : total:2870 free:1270 used:1600 new:2000 

Fri May  2 15:33:00 2014 

Fri May  2 15:33:00 2014 

Writing the data to a file ...

Data to be recorded in the tp file :  DATA_0003 2870 DATA_0001 2870 DATA_0000 2870 DATA_0002 2870 DATA_0004 2870 

Fri May  2 15:33:00 2014 

Resizing the disks...

Fri May  2 15:33:00 2014 

Executing ALTER DISKGROUP DATA  RESIZE DISK DATA_0003 SIZE 1997M DISK DATA_0001 SIZE 1998M DISK DATA_0000 SIZE 1999M DISK DATA_0002 SIZE 2000M DISK DATA_0004 SIZE 2000M REBALANCE WAIT/* ASRU */

Fri May  2 15:33:17 2014 

Calculating the sizes of the disks..

Fri May  2 15:33:17 2014 

Executing /* ASRU */SELECT D.NAME,D.TOTAL_MB,D.FREE_MB,G.ALLOCATION_UNIT_SIZE 

            FROM V$ASM_DISK D, 

            V$ASM_DISKGROUP G WHERE 

            D.GROUP_NUMBER = G.GROUP_NUMBER AND G.NAME='DATA'

Disk sizes after first resize: 

DATA_0003 : 1997

DATA_0001 : 1998

DATA_0000 : 1999

DATA_0002 : 2000

DATA_0004 : 2000

Checking whether the resize is done successfully or not..

Fri May  2 15:33:17 2014 

Fri May  2 15:33:17 2014 

Fri May  2 15:33:17 2014 

Power given to the free function : 1

Retrieving the paths:

DATA_0003 : /dev/raw/raw5

DATA_0001 : /dev/raw/raw2

DATA_0000 : /dev/raw/raw1

DATA_0002 : /dev/raw/raw3

DATA_0004 : /dev/raw/raw6

Executing the zerofill at /refresh/asmsupt/home/asru/zerofill

Completed parsing disk and their ranges which are to be zeroed

Batch number 1 started

Executing /bin/dd if=/dev/zero of=/dev/raw/raw5 seek=1998 bs=1024k count=872

Batch number 1 ended

Batch number 2 started

Executing /bin/dd if=/dev/zero of=/dev/raw/raw2 seek=1999 bs=1024k count=871

Batch number 2 ended

Batch number 3 started

Executing /bin/dd if=/dev/zero of=/dev/raw/raw1 seek=2000 bs=1024k count=870

Batch number 3 ended

Batch number 4 started

Executing /bin/dd if=/dev/zero of=/dev/raw/raw3 seek=2001 bs=1024k count=869

8.3) Then, “check all repair” was executed on the +DATA diskgroup to confirm or discard inconsistencies (to validate the diskgroup):

SQL> alter diskgroup data CHECK ALL REPAIR;

Diskgroup altered.

8.4) “CHECK ALL REPAIR” did not report any ASM inconsistency:

SQL> alter diskgroup data CHECK ALL REPAIR

NOTE: starting check of diskgroup DATA

Fri May 02 16:00:41 2014

GMON checking disk 0 for group 1 at 9 for pid 22, osid 9360

GMON checking disk 1 for group 1 at 10 for pid 22, osid 9360

GMON checking disk 2 for group 1 at 11 for pid 22, osid 9360

GMON checking disk 3 for group 1 at 12 for pid 22, osid 9360

GMON checking disk 4 for group 1 at 13 for pid 22, osid 9360

SUCCESS: check of diskgroup DATA found no errors

SUCCESS: alter diskgroup data check all repair

8.5) Then, “AMDU” was executed as well as a second health check:

SQL> alter diskgroup data dismount;

Diskgroup altered.

 SQL> select name, state from v$asm_diskgroup;

NAME                           STATE

------------------------------ -----------

DATA                           DISMOUNTED
 

[ceintcb15]/refresh/asmsupt/home/asru> amdu -diskstring '/dev/raw/raw*' -dump 'DATA'

amdu_2014_05_02_16_34_12/

[ceintcb15]/refresh/asmsupt/home/asru>   

[ceintcb15]/refresh/asmsupt/home/asru> cd amdu_2014_05_02_16_34_12/

[ceintcb15]/refresh/asmsupt/home/asru/amdu_2014_05_02_16_34_12> ls -l

total 64608

-rw-r--r-- 1 asmsupt asmsupt 66068480 May  2 16:34 DATA_0001.img

-rw-r--r-- 1 asmsupt asmsupt     5680 May  2 16:34 DATA.map

-rw-r--r-- 1 asmsupt asmsupt     8868 May  2 16:34 report.txt

[ceintcb15]/refresh/asmsupt/home/asru/amdu_2014_05_02_16_34_12>
         

8.6) AMDU did not report any corrupted block on the +DATA diskgroup:

------------------------- SUMMARY FOR DISKGROUP DATA -------------------------

           Allocated AU's: 7994

                Free AU's: 6356

       AU's read for dump: 71

       Block images saved: 16130

        Map lines written: 71

          Heartbeats seen: 0

  Corrupt metadata blocks: 0   <(====

        Corrupt AT blocks: 0   <(====

******************************* END OF REPORT ********************************
                                                                                  

9) After the +DATA diskgroup was validated and we confirmed it is in good shape, then ASRU was executed again as follows:

9.1) DATA diskgroup was mounted back:

SQL> alter diskgroup data mount;

Diskgroup altered.
 

9.2) Then ASRU was executed again, it is very clear in the results below that ASRU “started over” and resized (dd) all the 5 disks again (it started from scratch):

[ceintcb15]/refresh/asmsupt/home/asru> ASRU DATA

Checking the system ...done

Calculating the sizes of the disks ...done

Writing the data to a file ...done

Resizing the disks...done

Calculating the sizes of the disks ...done

/refresh/asmsupt/app/oracle/product/ASMdbA/perl/bin/perl -I /refresh/asmsupt/app/oracle/product/ASMdbA/perl/lib/5.10.0 /refresh/asmsupt/home/asru/zerofill 1 /dev/raw/raw5 1999 2870 /dev/raw/raw2 1999 2870 /dev/raw/raw1 2000 2870 /dev/raw/raw3 2001 2870 /dev/raw/raw6
2000 2870

871+0 records in

871+0 records out

913309696 bytes (913 MB) copied, 24.8047 seconds, 36.8 MB/s

871+0 records in

871+0 records out

913309696 bytes (913 MB) copied, 20.2713 seconds, 45.1 MB/s

870+0 records in

870+0 records out

912261120 bytes (912 MB) copied, 20.4338 seconds, 44.6 MB/s

869+0 records in

869+0 records out

911212544 bytes (911 MB) copied, 18.9407 seconds, 48.1 MB/s

870+0 records in

870+0 records out

912261120 bytes (912 MB) copied, 19.1543 seconds, 47.6 MB/s

Calculating the sizes of the disks ...done

Resizing the disks...done

Calculating the sizes of the disks ...done

Dropping the file ...done

 
 
 


Notes:

1) Usually ASRU execution should not be interrupted due to during this operation the disks could be shrunk or resized, therefore this action could generate a possible corruption issue on the ASM physical disks.

2) It is recommendable to perform this operation using a VNC session or directly on the console to avoid ASRU being interrupted due to session disconnections.

3) If it fails, it needs to start over, since at the moment ASRU has no capability to resume if it was cancelled/aborted. The following needs to be done:


3.1) Run the next health check on the diskgroup as follows:

SQL> alter diskgroup <diskgroup name> check all repair;
  

3.2) Then review the ASM alert.log which will report the results from the previous command and look for any corruption issue.

3.3) Obtain the AMDU dump from the affected diskgroup as follows (execute it as grid OS user):

$> <ASM Oracle Home>/bin/amdu -diskstring ' /dev/oracleasm/disks/*' -dump '<diskgroup name>'
  

Note 1: A new directory (e.g. amdu_2013_07_20_16_03_15/) with three files (<diskgroup name>_0001.img,  <diskgroup name>.map  & report.txt) will be created per diskgroup:

Note 2: Please review the report.txt file and look for any corruption issue reported by AMDU.

3.4) Also, please review the OS logs from all the nodes and look for any disk I/O issue:
=)> How To Gather The OS Logs For Each Specific OS Platform. (Doc ID 1349613.1)

3.5) If the previous health checks report no issues, then rerun the ASRU utility as described in the following document:

http://www.oracle.com/us/products/database/oracle-asru-3par.pdf
 
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: