27

I'm giving the bundled OpenZFS on Ubuntu 16.04 Xenial a try.

When creating pools, I always reference drives by their serials in /dev/disk/by-id/ (or /dev/disk/gpt on FreeBSD) for resiliency. Drives aren't always in the same order in /dev when a machine reboots, and if you have other drives in the machine the pool may fail to mount correctly.

For example, running zpool status on a 14.04 box I get this:

NAME                                  STATE     READ WRITE CKSUM
tank                                  ONLINE       0     0     0
  raidz1-0                            ONLINE       0     0     0
    ata-Hitachi_HDS722020ALA330_[..]  ONLINE       0     0     0
    ata-Hitachi_HDS722020ALA330_[..]  ONLINE       0     0     0
    ata-Hitachi_HDS722020ALA330_[..]  ONLINE       0     0     0
    ata-Hitachi_HUA722020ALA330_[..]  ONLINE       0     0     0

But when I create a new pool on 16.04 with this (abbreviated):

zpool create pool raidz \
    /dev/disk/by-id/ata-Hitachi_HDS723030ALA640_[..] \
    /dev/disk/by-id/ata-Hitachi_HDS723030ALA640_[..] \
    /dev/disk/by-id/ata-Hitachi_HDS723030ALA640_[..] \
    /dev/disk/by-id/ata-Hitachi_HDS723030ALA640_[..]

I get this with zpool status:

NAME        STATE     READ WRITE CKSUM
tank        ONLINE       0     0     0
  raidz1-0  ONLINE       0     0     0
    sdf     ONLINE       0     0     0
    sde     ONLINE       0     0     0
    sdd     ONLINE       0     0     0
    sda     ONLINE       0     0     0

It looks like zpool followed the symlinks, rather than referencing them.

Is there a way to force zpool on 16.04 to respect my drive references when creating a pool? Or alternatively, are my misgivings about what its doing here misplaced?

Update: Workaround

I found a thread for zfsonlinux on Github that suggested a workaround. Create your zpool with /dev/sdX devices first, then do this:

$ sudo zpool export tank
$ sudo zpool import -d /dev/disk/by-id -aN

I would still prefer to be able to do this with the initial zpool create though if possible.

Ruben Schade
  • 501
  • 1
  • 5
  • 12
  • It doesn't matter how you create them. If it reverts to /dev/sd? device names, the `zfs export` and `zfs import -d` will work anyway. BTW, unless you **really** need every byte of space, use two mirrored pairs rather than raidz. raidz's performance is better than raid-5 but still much worse than raid-10 or zfs mirrored pairs. it's also easier to expand a pool made up of mirrored pairs, just add two disks at a time...with raidz, you have to replace each of the drives with larger drives, and only when you've replaced all of them will your pool have more space available. – cas Jun 09 '16 at 06:34
  • I still have some raid-z pools, and regret having made them. When i can afford to buy replacement disks, I'll create new pools with mirrored pairs and use `zfs send` to copy my data to the new pools. Actually, raid-z is OK for my mythtv box where performance isn't critical unless i'm running 6 or 8 transcode jobs at once. Changing to mirrored pairs would be very noticeable on the pool where my `/home` directory lives. – cas Jun 09 '16 at 06:36
  • oh, and add a pair of SSDs....partitioned to give a mirrored pair of smallish (4GB or so is plenty) `log` (i.e. `ZIL` or `ZFS Intent Log`) devices, and two large (remainder of the SSDs?), non-mirrored `cache` devices for `L2ARC`. – cas Jun 09 '16 at 06:41
  • @cas Keep in mind that log and cache devices have completely different usage patterns: the first is hit by a large amount of data and needs high endurance/TBW as well as low latency and power-loss protection capacitors, mirroring is optional for safety. The second one needs high read IOPs and mirroring is only useful for availability and not losing the cache (if you don't use Solaris 11 which has permanent L2ARC). I would suggest to split instead of mirror, so you get the best for each use case. – user121391 Jun 09 '16 at 13:25
  • 2
    The mirroring of ZIL is so you can get away with using ordinary cheap SSDs rather than expensive ones with large capacitors to guard against power-loss. IMO, mirroring of the ZIL is **not** optional, no matter what kind of SSDs you have - if your ZIL dies, you lose all the yet-to-be-written data in it and potentially corrupt your pool. As for L2ARC, i specifically said **NOT** to mirror them...mirroring the L2ARC cache is a waste of time, money, and good SSD space (and would do nothing to prevent losing the cache - where did you get that idea from?) – cas Jun 09 '16 at 13:36
  • A basic Q turned into a meta ZFS discussion hah, but some interesting advice, thanks. I usually use mirrored pairs, but this is a dumb backup HP MicroServer samba target where performance isn't an issue and money is tight. Works just fine. – Ruben Schade Jun 10 '16 at 02:06
  • 1
    :) BTW, my brain wasn't working right when I explained the reason for mirroring ZIL. It's not to guard against power-loss, that's complete nonsense and i should never have said it. It's to guard against failure of the ZIL drive. i.e. raid-1 mirror for the ZIL. Two reasonably-priced SSDs are, in general, better than one extremely expensive one (unless the more expensive SSD has a much faster interface, like PCI-e vs SATA). and a UPS is essential...cheap protection against power-loss. – cas Jun 11 '16 at 12:25
  • 1
    @cas Mirrored ZIL protects against SLOG device failure *at the same time* as an unexpected shutdown. Under normal operations, the ZIL is write-only, and writes to persistent storage is from RAM (ARC). If the system shuts down unexpectedly, the intent log (ZIL, SLOG) is used to finish the writes that were interrupted. Only if the unexpected shut down coincides with failure of a SLOG device do you need redudant SLOG to recover the interrupted writes. For most non-server (and many server) workloads, a SLOG is overkill, as the ZIL really only comes into play with synchronous writes. – user Jun 11 '16 at 12:44
  • 1
    Does this work on ZFS root pools? I'm using Proxmox and they also use /dev/sda, etc... – Wouter Sep 17 '20 at 11:35

3 Answers3

8

I know this thread is sort of stale, but there is an answer. You need to update your cache file after you import. This example shows the default location for the cache file.

$> sudo zpool export POOL
$> sudo zpool import -d /dev/disk/by-id POOL
$> sudo zpool import -c /etc/zfs/zpool.cache
$> sudo zpool status POOL
NAME                                  STATE     READ WRITE CKSUM
POOL                                  ONLINE       0     0     0
  raidz1-0                            ONLINE       0     0     0
    ata-Hitachi_HDS722020ALA330_[..]  ONLINE       0     0     0
    ata-Hitachi_HDS722020ALA330_[..]  ONLINE       0     0     0
    ata-Hitachi_HDS722020ALA330_[..]  ONLINE       0     0     0
    ata-Hitachi_HUA722020ALA330_[..]  ONLINE       0     0     0
Steve O
  • 81
  • 1
  • 5
  • 2
    In your case you got disks named `ata-...` but in my case they're called `wwn-...`. I understand there's a nuance, but I find `ata-` to be more practical because the serial number is in the name. Can you tell how to switch `wwn` into `ata` ? – Wouter Sep 21 '20 at 16:52
  • I suppose the status names depend on how you imported the disks. The names in /dev/disk/by-id are assigned by the OS (Ubuntu in my case). How those names are derived is another subject. – Steve O Sep 22 '20 at 19:41
  • 2
    I read that you can move/delete the wwn* files and just run the command again. This time the ata* names will be discovered and used. Have not tried it yet. – Wouter Sep 23 '20 at 12:55
  • Hmmm... Sounds pretty sketchy to me. I wouldn't try that on a production environment. I would make a disposable VM for that little experiment – Steve O Sep 25 '20 at 17:59
  • 1
    I just tried @Wouter 's suggestion and it worked fine. (I moved the wwn shortcuts instead of deleting). Now you should be 100% confident it will work because "the guy on the internet said it was fine". – Dustin Wyatt Nov 04 '22 at 15:37
  • 1
    I can confirm that @Wouter suggestion worked fine – HitLuca Nov 08 '22 at 15:18
1

Dealing with WWNs

$ cd /dev/disk/by-id/
$ ls -l
… ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M5ZUJ4XE -> ../../sde
… ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M5ZUJ4XE-part1 -> ../../sde1
… ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M5ZUJ4XE-part9 -> ../../sde9
…
… wwn-0x50014ee2628c2228 -> ../../sde
… wwn-0x50014ee2628c2228-part1 -> ../../sde1
… wwn-0x50014ee2628c2228-part9 -> ../../sde9

Note that a drive or a drive partition can have more than one by-id. Apart from the ID based on the brand, model name and the serial number, nowadays there might also be a wwn- ID. This is the unique World Wide Name (WWN) and is also printed on the drive case.

Both type of IDs work fine with ZFS, but the WWN is a bit less telling. If these WWN IDs are not referenced by the production system (e.g. a root partition or a ZFS that has not been exported yet), these may simply be removed with sudo rm wwn-*. Trust me; I have done that. Nothing can go wrong as long as the ZFS is in an exported state before doing this.

After all, WWN IDs are mere symbolic links to sd devices that are created at drive detection. They will automatically reappear when the system is rebooted. Internally, Linux always references sd devices.

$ sudo zpool export pool0
$ sudo rm wwn-*
$ sudo zpool import -d /dev/disk/by-id/ pool0

Look here for more ZFS galore!

Serge Stroobandt
  • 2,314
  • 3
  • 32
  • 36
0

One in a while, zpool import -d /dev/disk/by-id doesn't work.

I've noticed this on more than one environment. I have an import script that, beyond also doing some magic logic and showing physically attached ZFS devices, also does basically this:

zpool import -d /dev/disk/by-id POOL
zpool export POOL
zpool import POOL

The second time around, even without the -d switch, imports by device ID even if it didn't the first time with the explicit command.

It's possible this was just due to a ZFS bug during a few week or month span of time (a year or two ago), and this is no longer necessary. I suppose I should have filed a bug report, but it was trivial to work around.

Bart
  • 2,151
  • 1
  • 10
  • 26
Jim
  • 309
  • 1
  • 6