Archive for March, 2015

Recovering from a kernel panic during zpool -f import

Monday, March 16th, 2015

Several weeks ago, the UPS protecting one of my media servers suffered a catastrophic failure after lightning directly struck the powerlines to my apartment. Like any good UPS, the unit sacrificed itself to protect my equipment, but it wasn’t enough: both my power supply and motherboard needed to be replaced. Fortunately, the drives appear to have survived, as they’re still showing up and responding to S.M.A.R.T. queries.

After replacing the power supply and motherboard, I installed the latest version of FreeBSD (currently, 10.0) on the OS hard drive and tried importing the old zpool (which was from an old FreeBSD 8.2 system).

With all 24 drives connected to the server, I issued a zpool import command with the hope the zpool was recoverable:

[root@mediaserver3 /home/mediaserver]# zpool import
   pool: mediatank
     id: 16704184877843764333
  state: ONLINE
 status: The pool was last accessed by another system.
 action: The pool can be imported using its name or numeric identifier and
	the '-f' flag.

	mediatank                  ONLINE
	  raidz1-0                 ONLINE
	    label/WD2TB-row0-col0  ONLINE
	    label/WD2TB-row0-col1  ONLINE
	    label/WD2TB-row0-col2  ONLINE
	    label/WD2TB-row0-col3  ONLINE
	  raidz1-1                 ONLINE
	    label/WD.5TB-r1-c2     ONLINE
	    label/WD.5TB-r1-c3     ONLINE
	    label/WD.5TB-r1-c0     ONLINE
	    label/WD.5TB-r1-c1     ONLINE
	  raidz1-2                 ONLINE
	    label/WD1TB-r2-c0      ONLINE
	    label/WD1TB-r2-c1      ONLINE
	    label/WD1TB-r2-c2      ONLINE
	    label/WD1TB-r2-c3      ONLINE
	  raidz1-3                 ONLINE
	    label/WD2TB-r3-c0_b    ONLINE
	    label/WD2TB-r3-c1      ONLINE
	    label/WD2TB-r3-c2      ONLINE
	    label/WD2TB-r3-c3      ONLINE
	  raidz1-4                 ONLINE
	    label/HIT2TB-r4-c0     ONLINE
	    label/HIT2TB-r4-c1     ONLINE
	    label/HIT2TB-r4-c2     ONLINE
	    label/HIT2TB-r4-c3     ONLINE
	  raidz1-5                 ONLINE
	    label/FLOOD2WDr5c0     ONLINE
	    label/FLOOD2WDr5c1     ONLINE
	    label/FLOOD2HIr5c2     ONLINE
	    label/FLOOD2HIr5c3     ONLINE
[root@mediaserver3 /home/mediaserver]#

Perfect. The zpool wasn’t degraded. Unfortunately, the lightning strike didn’t give me a chance to properly export the zpool before replacing the motherboard and upgrading the OS, so I’d need to forcefully import the zpool on the this new system using zpool import -f:

[root@mediaserver3 /home/mediaserver]# zpool import -f mediatank

Several seconds later, my SSH connection was severed and I received the following panic on the local console of the machine:

panic: solaris assert: end <= sm->sm_start + sm->sm_size (0x6004e1891c000 <= 0x5000000000), file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c, line: 120
cpuid = 0
KDB: stack backtrace
#0 0xffffffff808e7dd0 at kbd_backtrace+0x60
#1 0xffffffff808af8b5 at panic+0x155
#2 0xffffffff81b6723f at assfail3+0x2f
#3 0xffffffff81a7bbf9 at space_map_add+0xb9
#4 0xffffffff81a7c429 at space_map_load+0x229
#5 0xffffffff81a66ee0 at metaslab_activate+0x80
#6 0xffffffff81a66209 at metaslab_alloc+0x6e9
#7 0xffffffff81aa0ad2 at zio_dva_allocate+0x136
#8 0xffffffff81a9e6a6 at zio_execute+0x136
#9 0xffffffff808f5b66 at taskqueue_run_locked+0xe6
#10 0xffffffff808f63e8 at taskqueue_thread_loop+0xa8
#11 0xffffffff8088198a at fork_exit+0x9a
#12 0xffffffff80c758ce at fork_trampoline+0xe
Uptime 5m19s

I then spent a few minutes Googling and reading up on the documentation for zpool import. I then tried:

[root@mediaserver3 ~]# zpool import -f -N -o readonly=on -o failmode=continue mediatank

This seemed to successfully import the old pool in read-only mode using legacy support:

[root@mediaserver3 ~]# zpool status
  pool: mediatank
 state: ONLINE
status: The pool is formatted using a legacy on-disk format.  The pool can
	still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
	pool will no longer be accessible on software that does not support feature
  scan: none requested

	NAME                       STATE     READ WRITE CKSUM
	mediatank                  ONLINE       0     0     0
	  raidz1-0                 ONLINE       0     0     0
	    label/WD2TB-row0-col0  ONLINE       0     0     0
	    label/WD2TB-row0-col1  ONLINE       0     0     0
	    label/WD2TB-row0-col2  ONLINE       0     0     0
	    label/WD2TB-row0-col3  ONLINE       0     0     0
	  raidz1-1                 ONLINE       0     0     0
	    label/WD.5TB-r1-c2     ONLINE       0     0     0
	    label/WD.5TB-r1-c3     ONLINE       0     0     0
	    label/WD.5TB-r1-c0     ONLINE       0     0     0
	    label/WD.5TB-r1-c1     ONLINE       0     0     0
	  raidz1-2                 ONLINE       0     0     0
	    label/WD1TB-r2-c0      ONLINE       0     0     0
	    label/WD1TB-r2-c1      ONLINE       0     0     0
	    label/WD1TB-r2-c2      ONLINE       0     0     0
	    label/WD1TB-r2-c3      ONLINE       0     0     0
	  raidz1-3                 ONLINE       0     0     0
	    label/WD2TB-r3-c0_b    ONLINE       0     0     0
	    label/WD2TB-r3-c1      ONLINE       0     0     0
	    label/WD2TB-r3-c2      ONLINE       0     0     0
	    label/WD2TB-r3-c3      ONLINE       0     0     0
	  raidz1-4                 ONLINE       0     0     0
	    label/HIT2TB-r4-c0     ONLINE       0     0     0
	    label/HIT2TB-r4-c1     ONLINE       0     0     0
	    label/HIT2TB-r4-c2     ONLINE       0     0     0
	    label/HIT2TB-r4-c3     ONLINE       0     0     0
	  raidz1-5                 ONLINE       0     0     0
	    label/FLOOD2WDr5c0     ONLINE       0     0     0
	    label/FLOOD2WDr5c1     ONLINE       0     0     0
	    label/FLOOD2HIr5c2     ONLINE       0     0     0
	    label/FLOOD2HIr5c3     ONLINE       0     0     0

errors: No known data errors

I then needed a way to mount my zpool so that I could peer inside of it and confirm there wasn't any data loss. Unfortunately, neither zpool import nor zfs mount mounts legacy mountpoints. I fell back to using good ol' mount:

[root@mediaserver3 ~]# mkdir /mediatank3
[root@mediaserver3 ~]# mount -t zfs mediatank /mediatank3

Changing directory into /mediatank3 and listing it's contents confirmed everything was there and none of my data had been lost.

Update: After several attempts of trying to clear the read-only flag and get the zpool into a writeable state, I finally gave up. I didn't lose any data but this zpool wasn't ever going to be writeable. I transferred everything to another array using rsync, destroyed the zpool, and re-created it.