Posts Tagged ‘ZFS’

glabel Labels Disappear after ‘zpool import’ in FreeBSD

Tuesday, July 5th, 2011

Due to some odd behavior on my server’s host bust adapters, my hard drives don’t always spin up in the exact same order after every restart. The device name /dev/da1 might be assigned to a drive in (row 2, column 2) of the server for a few months, but after the next reboot, another drive – say (row 2, column 3) – might wind up with the same assignment. It’s all a matter of how fast the drives check in after the power’s been turned on.

This behavior is annoying and becomes problematic when a drive needs to be replaced due to some kind of hardware failure. If I know /dev/da1 has failed, but I have no idea *which* drive is /dev/da1 in the server, fixing the problem could take hours of manually yanking and checking each device in the chassis. Such an approach isn’t ideal since it greatly increases the risk of data loss, and places the integrity of my RAIDZ vdev in jeopardy until the process is completed.

To overcome this, I chose to rely on glabel. Basically, I label each drive with a string that would identify it’s make, it’s model, and it’s physical (row, col) location in the server chassis. When a drive fails, I’ll already know it’s device assignment (from /var/log/messages) and with a quick glabel status command, will know it’s corresponding label. Once I know the label, it will be immediately obvious which drive in the chassis would need to be replaced.

Here’s an example output of all my current labels:

[mediaserver@mediaserver ~]$ glabel status
                 Name  Status  Components
   label/HIT2TB-r4-c3     N/A  da0
   label/HIT2TB-r4-c2     N/A  da1
   label/HIT2TB-r4-c1     N/A  da2
   label/HIT2TB-r4-c0     N/A  da3
    label/WD1TB-r2-c3     N/A  da4
    label/WD1TB-r2-c2     N/A  da5
    label/WD1TB-r2-c0     N/A  da6
    label/WD1TB-r2-c1     N/A  da7
    label/WD2TB-r3-c3     N/A  da8
    label/WD2TB-r3-c2     N/A  da9
    label/WD2TB-r3-c1     N/A  da10
    label/WD2TB-r3-c0     N/A  da11
   label/WD.5TB-r1-c2     N/A  da12
   label/WD.5TB-r1-c3     N/A  da13
   label/WD.5TB-r1-c1     N/A  da14
   label/WD.5TB-r1-c0     N/A  da15
label/WD2TB-row0-col3     N/A  da16
label/WD2TB-row0-col2     N/A  da17
label/WD2TB-row0-col1     N/A  da18
label/WD2TB-row0-col0     N/A  da19
    label/WD2TB-r5-c1     N/A  da21
    label/WD2TB-r5-c2     N/A  da22
    label/WD2TB-r5-c3     N/A  da23
    label/WD2TB-r5-c0     N/A  da20

As you can see, /dev/da1 in this case refers to a drive labeled HIT2B-r4-c2. This label tells me the drive’s make (Hitachi), model (2TB), and location (row 4, column 2) in the server chassis. If that drive fails a year or two and a couple of reboots from now, it most likely won’t have the same device name. But that won’t matter: even if that drive gets reassigned to /dev/da22 and then fails, I’ll just need to find out which label corresponded to /dev/da22 to get the information I need (mainly it’s size and location) to replace it.

Getting labels to stay put between reboots is kind of tricky though. I, along with a few other people, were noticing that our labels weren’t sticking around after a power cycle, with most of us documenting our frustration and troubleshooting efforts on the FreeBSD forums. I’m happy to report that although it took a few days of effort, I finally found a way to make the labels stick. I’ve documented my method below:

Basically, if this is your problem:
You glabel a bunch of drives, gnop them, add them to a vdev, and then reboot, and your *.nop names are gone (which was to be expected) but your labels are also gone/not showing up when the pool that the vdev they’re in is imported…

Then this is your solution:
After inserting, glabel’ing and gnop’ing and making a vdev out of all 4 drives, and then adding that vdev to the zpool, do not reboot.

I’m already assuming you got to this point by physically adding your drives to your server, finding out what device names they were assigned (tail /var/log/message), and running the following two commands for all 4 drives (which were da8, da9, da10, and da11 for me):

# glabel label WD1TB-r2-c0 /dev/da8
# gnop create -S 4096 /dev/label/WD1TB-r2-c0

And then added all of them to the zpool like so:

# zpool add mediatank raidz /dev/label/WD1TB-r2-c{0..3}.nop

If you did not get to this point by using the above commands, then I don’t know if this solution is right for you. ZFS’s behavior hasn’t exactly been logical on this issue, so I’m not sure if this will get your labels back if you used some other method.

Now, once you’ve created your new RAIDZ vdev do not reboot. One by one, we’re going to go through each drive, offline it, destroy the gnop, dd over the entire surface, relabel it, and then replace it with itself in the vdev. It’s a tedious process, but it works.

When you’ve finished with the first drive, come back to this point, and begin anew with the second drive. Same for the third, fourth, etc.

To start, you need to offline the first gnop. In my case, that’s WD1TB-r2-c0.nop

# zpool offline mediatank /dev/label/WD1TB-r2-c0.nop

Then export the pool:

# zpool export mediatank

And destroy the gnop:

# gnop destroy /dev/label/WD1TB-r2-c0.nop

Reimport the pool:

# zpool import mediatank

Check the status of your pool:

[root@mediaserver3 /dev]# zpool status
  pool: mediatank
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
	Sufficient replicas exist for the pool to continue functioning in a
	degraded state.
action: Online the device using 'zpool online' or replace the device with
	'zpool replace'.
 scrub: none requested
config:

	NAME                       STATE     READ WRITE CKSUM
	mediatank                  DEGRADED     0     0     0
	  raidz1                   ONLINE       0     0     0
	    label/WD2TB-row0-col0  ONLINE       0     0     0
	    label/WD2TB-row0-col1  ONLINE       0     0     0
	    label/WD2TB-row0-col2  ONLINE       0     0     0
	    label/WD2TB-row0-col3  ONLINE       0     0     0
	  raidz1                   ONLINE       0     0     0
	    label/WD.5TB-r1-c2     ONLINE       0     0     0
	    label/WD.5TB-r1-c3     ONLINE       0     0     0
	    label/WD.5TB-r1-c0     ONLINE       0     0     0
	    label/WD.5TB-r1-c1     ONLINE       0     0     0
	  raidz1                   DEGRADED     0     0     0
	    da8                    OFFLINE      0     0     0
	    label/WD1TB-r2-c1.nop  ONLINE       0     0     0
	    label/WD1TB-r2-c2.nop  ONLINE       0     0     0
	    label/WD1TB-r2-c3.nop  ONLINE       0     0     0

errors: No known data errors

You should notice that the disk we offlined earlier is now showing up as da_whatever (in my case, that’s da8).

Our next goal is to wipe the vdev information off of the disk, which means that instead of zeroing the entire drive, all we need to do is wipe out the front and the back of it. How much of the front and how much of the back I can’t say: I tried zeroing out the first and the last MiB of the disk, which worked so it’s good enough for me.

Zero out the first MiB of the drive:

# dd if=/dev/zero of=/dev/da8 bs=1m count=1

Then, find out how many sectors your drive has:

[root@mediaserver3 /dev/label]# dmesg | grep "da8"
da8 at mps1 bus 0 scbus1 target 3 lun 0
da8:  Fixed Direct Access SCSI-5 device 
da8: 300.000MB/s transfers
da8: Command Queueing enabled
da8: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
GEOM: da8: partition 1 does not start on a track boundary.
GEOM: da8: partition 1 does not end on a track boundary.

Note how this particular drive has 3907029168 sectors that are 512 bytes in size? If I want to zero over the last mebibyte of the drive, that means I’m going to need to seek to sector 3907029168 – 2048 = 3907027120. (The 2048 came from the fact that 1 mebibyte = 1048576 bytes. 1048576 bytes/512 bytes = 2048.)

Next, zero out the last mebibyte of the drive using the adjusted sector count for the seek argument:

# dd if=/dev/zero of=/dev/da8 seek=3907027120

Then, once it’s been zeroed out, relabel the drive:

# glabel label WD1TB-r2-c0 /dev/da8

And replace the newly zeroed and relabeled drive with that drive’s old entry in the zpool:

# zpool replace -f mediatank da8 /dev/label/WD1TB-r2-c0

ZFS will begin resilvering the drive, which you can confirm with a zpool status command:

[root@mediaserver3 /dev]# zpool status
  pool: mediatank
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h0m, 1.32% done, 0h29m to go
config:

	NAME                       STATE     READ WRITE CKSUM
	mediatank                  DEGRADED     0     0     0
	  raidz1                   ONLINE       0     0     0
	    label/WD2TB-row0-col0  ONLINE       0     0     0
	    label/WD2TB-row0-col1  ONLINE       0     0     0
	    label/WD2TB-row0-col2  ONLINE       0     0     0
	    label/WD2TB-row0-col3  ONLINE       0     0     0
	  raidz1                   ONLINE       0     0     0
	    label/WD.5TB-r1-c2     ONLINE       0     0     0
	    label/WD.5TB-r1-c3     ONLINE       0     0     0
	    label/WD.5TB-r1-c0     ONLINE       0     0     0
	    label/WD.5TB-r1-c1     ONLINE       0     0     0
	  raidz1                   DEGRADED     0     0     0
	    replacing              DEGRADED     0     0     0
	      da8                  OFFLINE      0     0     0
	      label/WD1TB-r2-c0    ONLINE       0     0     0  52K resilvered
	    label/WD1TB-r2-c1.nop  ONLINE       0     0     0
	    label/WD1TB-r2-c2.nop  ONLINE       0     0     0
	    label/WD1TB-r2-c3.nop  ONLINE       0     0     0

errors: No known data errors

When the resilvering has finished (took about an hour and a half per drive), repeat the same procedure for da9, da10, and da11, or whatever your drive names are.

And then run one last export/import before restarting:

# spool export mediatank
# zpool import mediatank

You’re done.

This works for me on the following system:

[root@mediaserver3 /home/mediaserver]# uname -a
FreeBSD mediaserver3 8.2-STABLE FreeBSD 8.2-STABLE #0: Thu May 12 10:04:15 CDT 2011     root@mediaserver:/usr/obj/usr/src/sys/GENERIC  amd64

Sharing a ZFS Pool over NFS

Friday, June 10th, 2011

Technology is just getting better and better these days.