您的位置:首页 > 运维架构

ZFS Tutorial - enjoy the fancy ZFS file system on Solaris/OpenSolaris

2008-12-11 21:02 513 查看

Learning to use ZFS, Sun's new filesystem.

ZFS is an open source
filesystem used in Solaris 10, with growing support from other
operating systems. This series of tutorials shows you how to use ZFS
with simple hands-on examples that require a minimum of resources.

In this tutorial I hope to give you a brief overview of ZFS and show
you how to manage ZFS pools, the foundation of ZFS. In subsequent parts
will we look at ZFS filesystems in more depth.

This tutorial was created on 2007-03-07 and last revised on 2008-08-24.

ZFS Tutorial Series

Overview of ZFS & ZFS Pool Management
ZFS Filesystem Management, Mountpoints and Filesystem Properties
Snapshots, Clones and ZFS Backup (due September 2008)
NFS and ZFS, ZFS with Zones (due October 2008)
More ZFS (due November 2008)

Let your hook be always cast; in the pool where you least expect it, there will be a fish. — Ovid

Getting Started

You need:

An operating system with ZFS support:
Solaris 10 6/06 or later [download]
OpenSolaris [download]
Mac OS X 10.5 Leopard (requires ZFS download)
FreeBSD 7 (untested) [download]
Linux using FUSE (untested) [download]


Root privileges (or a role with the appropriate ZFS rights profile)
Some storage, either:
512 MB of disk space on an existing partition
Four spare disks of the same size



Using Files

To use files on an existing filesystem, create four 128 MB files, eg.:

# mkfile 128m /home/ocean/disk1
# mkfile 128m /home/ocean/disk2
# mkfile 128m /home/ocean/disk3
# mkfile 128m /home/ocean/disk4

# ls -lh /home/ocean
total 1049152
-rw------T   1 root     root        128M Mar  7 19:48 disk1
-rw------T   1 root     root        128M Mar  7 19:48 disk2
-rw------T   1 root     root        128M Mar  7 19:48 disk3
-rw------T   1 root     root        128M Mar  7 19:48 disk4


Using Disks

To use real disks in the tutorial make a note of their names (eg.
c2t1d0 or c1d0 under Solaris). You will be destroying all the partition
information and data on these disks, so be sure they're not needed.

In the examples I will be using files named disk1, disk2, disk3, and
disk4; substitute your disks or files for them as appropriate.

ZFS Overview

The architecture of ZFS has three levels. One or more ZFS filesystems exist in a ZFS pool, which consists of one of more devices*
(usually disks). Filesystems within a pool share its resources and are
not restricted to a fixed size. Devices may be added to a pool while
its still running: eg. to increase the size of a pool. New filesystems
can be created within a pool without taking filesystems offline. ZFS
supports filesystems snapshots and cloning existing filesystems. ZFS
manages all aspects of the storage: volume management software (such as
SVM or Veritas) is not needed.

*Technically a virtual device (vdev), see the zpool(1M) man page for more.

ZFS is managed with just two commands:

zpool - Manages ZFS pools and the devices within them.
zfs - Manages ZFS filesystems.

If you run either command with no options it gives you a handy options summary.

Pools

All ZFS filesystems live in a pool, so the first step is to create a pool. ZFS pools are administered using the zpool command.

Before creating new pools you should check for existing pools to avoid
confusing them with your tutorial pools. You can check what pools exist
with zpool list:

# zpool list
no pools available

NB. OpenSolaris now uses ZFS, so you will likely have an existing ZFS pool called syspool on this OS.

Single Disk Pool

The simplest pool consist of a single device. Pools are created using zpool create. We can create a single disk pool as follows (you must use the absolute path to the disk file):

# zpool create herring /home/ocean/disk1
# zpool list
NAME                    SIZE    USED   ***AIL    CAP  HEALTH     ALTROOT
herring                 123M   51.5K    123M     0%  ONLINE     -


No volume management, configuration, newfs or mounting is required.
You now have a working pool complete with mounted ZFS filesystem under
/herring (/Volumes/herring on Mac OS X - you can also see it mounted on
your Mac desktop). We will learn about adjusting mount points in part 2 of the tutorial.

Create a file in the new filesystem:

# mkfile 32m /herring/foo
# ls -lh /herring/foo 
-rw------T   1 root     root         32M Mar  7 19:56 /herring/foo

# zpool list
NAME                    SIZE    USED   ***AIL    CAP  HEALTH     ALTROOT
herring                 123M   32.1M   90.9M    26%  ONLINE     -


The new file is using about a quarter of the pool capacity (indicated
by the CAP value). NB. If you run the list command before ZFS has
finished writing to the disk you will see lower USED and CAP values
than shown above; wait a few moments and try again.

Now destroy your pool with zpool destroy:

# zpool destroy herring
# zpool list
no pools available


On Mac OS X you need to force an unmount of the filesyetem (using
umount -f /Volumes/herring) before destroying it as it will be in use
by fseventsd.

You will only receive a warning about destroying your pool if it's in
use. We'll see in a later tutorial how you can recover a pool you've
accidentally destroyed.

Mirrored Pool

A pool composed of a single disk doesn't offer any redundancy. One
method of providing redundancy is to use a mirrored pair of disk as a
pool:

# zpool create trout mirror /home/ocean/disk1 /home/ocean/disk2

# zpool list
NAME                    SIZE    USED   ***AIL    CAP  HEALTH     ALTROOT
trout                   123M   51.5K    123M     0%  ONLINE     -


To see more detail about the pool use zpool status:

# zpool status trout
  pool: trout
 state: ONLINE
 scrub: none requested
config:
        NAME                          STATE     READ WRITE CKSUM
        trout                         ONLINE       0     0     0
          mirror                      ONLINE       0     0     0
            /home/ocean/disk1         ONLINE       0     0     0
            /home/ocean/disk2         ONLINE       0     0     0

errors: No known data errors


We can see our pool contains one mirror of two disks. Let's create a file and see how USED changes:

# mkfile 32m /trout/foo

# zpool list
NAME                    SIZE    USED   ***AIL    CAP  HEALTH     ALTROOT
trout                   123M   32.1M   90.9M    26%  ONLINE     -


As before about a quarter of the disk has been used; but the data is
now stored redundantly over two disks. Let's test it by overwriting the
first disk label with random data (if you are using real disks you
could physically disable or remove a disk instead):

# dd if=/dev/random of=/home/ocean/disk1 bs=512 count=1


ZFS automatically checks for errors when it reads/writes files, but we can force a check with the zfs scrub command.

# zpool scrub trout

# zpool status
  pool: trout
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-4J  scrub: scrub completed with 0 errors on Wed Mar  7 20:42:07 2007
config:
        NAME                          STATE     READ WRITE CKSUM
        trout                         DEGRADED     0     0     0
          mirror                      DEGRADED     0     0     0
            /home/ocean/disk1         UN***AIL      0     0     0  corrupted data
            /home/ocean/disk2         ONLINE       0     0     0

errors: No known data errors


The disk we used dd on is showing as UN***AIL with corrupted data,
but no data errors are reported for the pool as a whole, and we can
still read and write to the pool:

# mkfile 32m /trout/bar
# ls -l /trout/
total 131112
-rw------T   1 root     root     33554432 Mar  7 20:43 bar
-rw------T   1 root     root     33554432 Mar  7 20:35 foo


To maintain redundancy we should replace the broken disk with another. If you are using a physical disk you can use the zpool replace
command (the zpool man page has details). However, in this file-based
example I remove the disk file from the mirror and recreate it.

Devices are detached with zpool detach:

# zpool detach trout /home/ocean/disk1

# zpool status trout
  pool: trout
 state: ONLINE
 scrub: scrub completed with 0 errors on Wed Mar  7 20:42:07 2007
config:
        NAME                        STATE     READ WRITE CKSUM
        trout                       ONLINE       0     0     0
          /home/ocean/disk2         ONLINE       0     0     0

errors: No known data errors

# rm /home/ocean/disk1
# mkfile 128m /home/ocean/disk1


To attach another device we specify an existing device in the mirror to attach it to with zpool attach:

# zpool attach trout /home/ocean/disk2 /home/ocean/disk1


If you're quick enough, after you attach the new disk you will see a resilver (remirroring) in progress with zpool status.

# zpool status trout
  pool: trout
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress, 69.10% done, 0h0m to go
config:
        NAME                          STATE     READ WRITE CKSUM
        trout                         ONLINE       0     0     0
          mirror                      ONLINE       0     0     0
            /home/ocean/disk2         ONLINE       0     0     0
            /home/ocean/disk1         ONLINE       0     0     0

errors: No known data errors


Once the resilver is complete, the pool is healthy again (you can also use ls to check the files are still there):

# zpool status trout
  pool: trout
 state: ONLINE
 scrub: resilver completed with 0 errors on Wed Mar  7 20:58:17 2007
config:
        NAME                          STATE     READ WRITE CKSUM
        trout                         ONLINE       0     0     0
          mirror                      ONLINE       0     0     0
            /home/ocean/disk2         ONLINE       0     0     0
            /home/ocean/disk1         ONLINE       0     0     0

errors: No known data errors


Adding to a Mirrored Pool

You can add disks to a pool without taking it offline. Let's double the size of our trout pool:

# zpool list
NAME                    SIZE    USED   ***AIL    CAP  HEALTH     ALTROOT
trout                   123M   64.5M   58.5M    52%  ONLINE     -

# zpool add trout mirror /home/ocean/disk3 /home/ocean/disk4

# zpool list
NAME                    SIZE    USED   ***AIL    CAP  HEALTH     ALTROOT
trout                   246M   64.5M    181M    26%  ONLINE     -


This happens almost instantly, and the filesystem within the pool
remains available. Looking at the status now shows the pool consists of
two mirrors:

# zpool status trout
  pool: trout
 state: ONLINE
 scrub: resilver completed with 0 errors on Wed Mar  7 20:58:17 2007
config:
        NAME                          STATE     READ WRITE CKSUM
        trout                         ONLINE       0     0     0
          mirror                      ONLINE       0     0     0
            /home/ocean/disk2         ONLINE       0     0     0
            /home/ocean/disk1         ONLINE       0     0     0
          mirror                      ONLINE       0     0     0
            /home/ocean/disk3         ONLINE       0     0     0
            /home/ocean/disk4         ONLINE       0     0     0

errors: No known data errors


We can see where the data is currently written in our pool using zpool iostat -v:

zpool iostat -v trout
                                 capacity     operations    bandwidth
pool                           used  avail   read  write   read  write
----------------------------  -----  -----  -----  -----  -----  -----
trout                         64.5M   181M      0      0  13.7K    278
  mirror                      64.5M  58.5M      0      0  19.4K    394
    /home/ocean/disk2             -      -      0      0  20.6K  15.4K
    /home/ocean/disk1             -      -      0      0      0  20.4K
  mirror                          0   123M      0      0      0      0
    /home/ocean/disk3             -      -      0      0      0    768
    /home/ocean/disk4             -      -      0      0      0    768
----------------------------  -----  -----  -----  -----  -----  -----


All the data is currently written on the first mirror pair, and none
on the second. This makes sense as the second pair of disks was added
after the data was written. If we write some new data to the pool the
new mirror will be used:

# mkfile 64m /trout/quuxx

# zpool iostat -v trout
                                 capacity     operations    bandwidth
pool                           used  avail   read  write   read  write
----------------------------  -----  -----  -----  -----  -----  -----
trout                          128M   118M      0      0  13.1K  13.6K
  mirror                      95.1M  27.9M      0      0  18.3K  9.29K
    /home/ocean/disk2             -      -      0      0  19.8K  21.2K
    /home/ocean/disk1             -      -      0      0      0  28.2K
  mirror                      33.2M  89.8M      0      0      0  10.4K
    /home/ocean/disk3             -      -      0      0      0  11.1K
    /home/ocean/disk4             -      -      0      0      0  11.1K
----------------------------  -----  -----  -----  -----  -----  -----


Note how a little more of the data has been written to the new
mirror than the old: ZFS tries to make best use of all the resources in
the pool.

ZFS Filesystems

ZFS filesystems within a pool are managed with the zfs command. Before you can manipulate filesystems you need to create a pool (you can learn about ZFS pools in part 1). When you create a pool, a ZFS filesystem is created and mounted for you.

ZFS Filesystem Basics

Create a simple mirrored pool and list filesystem information with zfs list:

# zpool create salmon mirror c3t2d0 c3t3d0

# zpool list
NAME                    SIZE    USED   ***AIL    CAP  HEALTH     ALTROOT
salmon                  136G   84.5K    136G     0%  ONLINE     -

# zfs list
NAME                   USED  ***AIL  REFER  MOUNTPOINT
salmon                75.5K   134G  24.5K  /salmon


We can see our filesystem is mounted on /salmon and is 134 GB in size.

We can create an arbitrary number (264) of new filesystems within our pool. Let's add
some filesystems space for three users with zfs create:

# zfs create salmon/kent
# zfs create salmon/dennisr
# zfs create salmon/billj

# zfs list
NAME                   USED  ***AIL  REFER  MOUNTPOINT
salmon                 168K   134G  28.5K  /salmon
salmon/billj          24.5K   134G  24.5K  /salmon/billj
salmon/dennisr        24.5K   134G  24.5K  /salmon/dennisr
salmon/kent           24.5K   134G  24.5K  /salmon/kent


Note how all four filesystems share the same pool space and all
report 134 GB available. We'll see how to set quotas and reserve space
for filesystems later in this tutorial.

We can create arbitrary levels of filesystems, so you could create whole tree of filesystems inside /salmon/kent.

We can also see our filesystems using df (output trimmed for brevity):

# df -h
Filesystem             size   used  avail capacity  Mounted on
salmon                 134G    28K   134G     1%    /salmon
salmon/kent            134G    24K   134G     1%    /salmon/kent
salmon/dennisr         134G    24K   134G     1%    /salmon/dennisr
salmon/billj           134G    24K   134G     1%    /salmon/billj


You can remove filesystems with zfs destroy. User billj has stopped working on salmon, so let's remove him:

# zfs destroy salmon/billj
# zfs list
NAME                   USED  ***AIL  REFER  MOUNTPOINT
salmon                 138K   134G  28.5K  /salmon
salmon/dennisr        24.5K   134G  24.5K  /salmon/dennisr
salmon/kent           24.5K   134G  24.5K  /salmon/kent


Mount Points

It's useful that ZFS automatically mounts your filesystem under the pool name, but this is often
not what you want. Thankfully it's very easy to change the properties of a ZFS filesystem, even when it's mounted.

You can set the mount point of a ZFS filesystem using zfs set mountpoint.
For example, if we want to move salmon under /projects directory:

# zfs set mountpoint=/projects/salmon salmon
# zfs list
NAME                   USED  ***AIL  REFER  MOUNTPOINT
salmon                 142K   134G  27.5K  /projects/salmon
salmon/dennisr        24.5K   134G  24.5K  /projects/salmon/dennisr
salmon/kent           24.5K   134G  24.5K  /projects/salmon/kent


On Mac OS X you need to force an unmount of the filesyetem (using
umount -f /Volumes/salmon) before changing the mount point as it will
be in use by fseventsd. To mount it again after setting a new mount point use 'zfs mount salmon'.

Mount points of filesystems are not limited to those of the pool as a whole, for example:

# zfs set mountpoint=/fishing salmon/kent
# zfs list
NAME                   USED  ***AIL  REFER  MOUNTPOINT
salmon                 148K   134G  27.5K  /projects/salmon
salmon/dennisr        24.5K   134G  24.5K  /projects/salmon/dennisr
salmon/kent           24.5K   134G  24.5K  /fishing


To mount and unmount ZFS filesystems you use zfs mount and zfs unmount*.
ZFS filesystems are entirely managed by ZFS by default, and don't appear in /etc/vfstab. In a future tutorial we
will look at using 'legacy' mount points to manage filesystems the traditional way.

*Old school Unix users will be pleased to know 'zfs umount' also works.

For example (mount output trimmed for brevity):

# zfs unmount salmon/kent

# mount | grep salmon
/projects/salmon on salmon
/projects/salmon/dennisr on salmon/dennisr

# zfs mount salmon/kent

# mount | grep salmon
/projects/salmon on salmon 
/projects/salmon/dennisr on salmon/dennisr 
/fishing on salmon/kent


Managing ZFS Filesystem Properties

Other filesystem properties work in the same way as the mount point (which is itself a property).
To get and set properties we use zfs get and zfs set. To see a list of all
filesystem properties we can use 'zfs get all':

# zfs get all salmon/kent
NAME             PROPERTY       VALUE                      SOURCE
salmon/kent      type           filesystem                 -
salmon/kent      creation       Fri Apr  6 13:14 2007      -
salmon/kent      used           24.5K                      -
salmon/kent      available      134G                       -
salmon/kent      referenced     24.5K                      -
salmon/kent      compressratio  1.00x                      -
salmon/kent      mounted        yes                        -
salmon/kent      quota          none                       default
salmon/kent      reservation    none                       default
salmon/kent      recordsize     128K                       default
salmon/kent      mountpoint     /fishing                   local
salmon/kent      sharenfs       off                        default
salmon/kent      checksum       on                         default
salmon/kent      compression    off                        default
salmon/kent      atime          on                         default
salmon/kent      devices        on                         default
salmon/kent      exec           on                         default
salmon/kent      setuid         on                         default
salmon/kent      readonly       off                        default
salmon/kent      zoned          off                        default
salmon/kent      snapdir        hidden                     default
salmon/kent      aclmode        groupmask                  default
salmon/kent      aclinherit     secure                     default


The first set of properties, with a SOURCE of '-', are read only and
give information on your filesystem; the rest of the properties can be
set with
'zfs set'. The SOURCE value shows where a property gets its value from,
other than '-' there are three sources for a property:

default - the default ZFS value for this property
local - the property is set directly on this filesystem
inherited - the property is inherited from a parent filesystem

The mountpoint property is shown as from a local
source, this is because we set the mountpoint for this filesystem
above. We'll see an example of an inherited property in the section on
compression (below).

I'm going to look at three properties in this section: quota,
reservation and compression (sharenfs will be covered in a future
tutorial). You can read about the remaining properties in the Sun ZFS Administration Guide.

Quotas & Reservations

All the filesystems in a pool share the same disk space, This
maximises flexibility and lets ZFS make best use of the resources,
however it does allow one filesystem to use all the space. To manage
space utilisation within a pool, filesystems can have quotas and reservations.
A quota sets a limit on the pool space a filesystem can use. A
reservation reserves part of the pool for the exclusive use of one
filesystem.

To see how this works, let's consider our existing pool:

# zfs list
NAME                   USED  ***AIL  REFER  MOUNTPOINT
salmon                 148K   134G  26.5K  /projects/salmon
salmon/dennisr        24.5K   134G  24.5K  /projects/salmon/dennisr
salmon/kent           24.5K   134G  24.5K  /fishing


For example, let's say we want to set a quota of 10 GB on dennisr
and kent to ensure there's space for other users to be added to salmon
(If you are using disk files or small disks just substitute a suitable
value, eg. quota=10M):

# zfs set quota=10G salmon/dennisr
# zfs set quota=10G salmon/kent

# zfs get quota salmon salmon/kent salmon/dennisr
NAME             PROPERTY       VALUE                      SOURCE
salmon           quota          none                       default
salmon/dennisr   quota          10G                        local
salmon/kent      quota          10G                        local


You can see how we used zfs get to retrieve a particular property
for a set of filesystems. There are some useful options we can use with
get:

-r recursively gets the property for all child filesystems.
-p reports exact values (e.g. 9437184 rather than 9M).
-H omits header fields, making the output easier for scripts to parse.
-o <fields> specify a list of fields you wish to get (avoids having to use awk or cut).

An example (excluding headers and not showing the source field):

# zfs get -rHp -oname,property,value quota salmon
salmon          quota   0       
salmon/dennisr  quota   10737418240
salmon/kent     quota   10737418240


As an example of reservations let's add a new filesystem and reserve
1 GB of space for it. This ensures that however full the disk gets,
when someone comes to use it there will be space.

# zfs create salmon/jeffb
# zfs set reservation=1G salmon/jeffb

# zfs get -r reservation salmon
NAME             PROPERTY       VALUE                      SOURCE
salmon           reservation    none                       default
salmon/dennisr   reservation    none                       default
salmon/jeffb     reservation    1G                         local
salmon/kent      reservation    none                       default


If we look at our list of filesystems with zfs list we can see the effect of the quotas and reservation:

# zfs list
NAME                   USED  ***AIL  REFER  MOUNTPOINT
salmon                1.00G   133G  27.5K  /projects/salmon
salmon/dennisr        24.5K  10.0G  24.5K  /projects/salmon/dennisr
salmon/jeffb          24.5K   134G  24.5K  /projects/salmon/jeffb
salmon/kent           24.5K  10.0G  24.5K  /fishing


As expected the space available to salmon/dennisr and salmon/kent is
now limited to 10 GB, but there appears to be no change to
salmon/jeffb. However, if we look at the used space for salmon as a
whole we can see this has risen to 1 GB. This space isn't actually
used, but because it has been reserved for salmon/jeffb it isn't
available to the rest of the pool. Reservations could lead you to
over-estimate the spaced used in your pool. The df command always
displays the actual usage, so can be handy in such situations.

Compression

ZFS has built-in support for compression. Not only does this save disk
space, but it can actually improve performance on systems with plenty
of CPU and highly compressible data, as it saves disk I/O. An obvious
candidate for compression is a logs directory.

This section has still to be written and should appear in September 2008.

That's it for part 2. In part 3 we will look at some of the most
exciting ZFS features: snapshots and clones, as well as how to backup a
ZFS filesystem. We'll create a new pool for part 2, so feel free to
destroy the salmon pool.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐