awktopus email address

2017-08-14

The first entry in this shiny new blog is about network attached storage. I'm pretty fond of it. A write-up seems a good a place to start. Finding means to reliably store and quickly retrieve ever growing datasets was an interesting challenge at work beginning in the 1980's. That became a hobby and I think maybe an obsession. It allows me to indulge a tendency to hoard without the mess. Folks' eyes glaze over whenever I talk about it. If you're interested in this stuff you may already know exactly what I mean.

In brief, this is a FreeBSD system using ZFS on an array of five 6TB drives with a 120GB SSD cache and two partitioned 40GB SSD log/boot drives running from an ASRock C2550D4I Mini ITX Server Motherboard which I first assembled on March 19, 2015.

Part 1

the hardware
amazon 2015.03.05WD Green 6TB WD60EZRX 3.5" hard drive x 5 $1344.12
newegg 2015.03.02Crucial Ct102472bd160b 8gb ddr3 1600 Ecc memory x 4 $363.96
newegg 2015.03.03ASRock C2550D4I Mini ITX Server Motherboard x 1 $288.98
newegg 2015.03.19SEASONIC X650 GOLD SS-650KM PSU x 1 $143.02
newegg 2015.02.28Lian Li PC-Q35A Black Aluminum Mini-ITX Tower Case x 1 $135.98
amazon 2015.03.11Intel 320 Series 40 GB 2.5" SATA SSD x 2 $104.90
newegg 2015.03.01ICY DOCK MB155SP-B FatCage 5x3.5" x 1 $92.99
newegg 2015.03.05Intel Pro 2500 SSDSC2BF120H501 2.5" 120GB SSD x 1 $80.98
newegg 2015.03.01ICY DOCK MB153SP-B 3 in 2 SATA Internal Backplane x 1 $64.99
_________
TOTAL$2619.92
WD Green 6TB WD60EZRX 3.5" hard drives

Over two years in and all is well with these drives. I have no complaints. These were about 10% cheaper than the WD reds. The reds include a TLER (time limited error recovery) feature that would be useful in a hardware raid environment but not required by this software raid. The reds spindown after 300 seconds of inactivity while the greens do so after only 8 seconds but spindown hasn't been an issue. It's possible that it could be an issue in other use scenarios, like for a web server. The warranty period was 2 years on these where it was 3 years on the reds and I'm right in the middle of the window between. I'll keep my fingers crossed another 6 months before considering it money saved.

Crucial Ct102472bd160b 8gb ddr3 1600 ecc memory

ZFS loves RAM, really loves it. I threw 32GB at it. These were among the modules recommended by ASRock for use with the motherboard. They're fine.

ASRock C2550D4I Mini ITX Server Motherboard

In theory I absolutely love this board. In practice it has not been flawless. The first board I received lasted 2 days. I found an EVGA 500W 80PLUS PSU purchased for the build wouldn't reliably deliver enough power to the 8 drives and had newegg send me the Seasonic PSU next day air since I was itching to finish the build. I then installed FreeBSD on the already setup zpool, powered it down, moved it into some cabinetry and it never booted again. I could access the IPMI but nothing else. Newegg RMA'd the board and I had another in a couple days. The second board lasted until May 10, 2017, almost 26 months, when I found the NAS offline. Again there was no POST at power on but I could access the management console. The board has a 3 year manufacturer's warranty. This time ASRock provided the RMA and the process wasn't terrible. Swapping the motherboard didn't cause any issues with the zpools and was a snap, pretty much just pull, replace and reboot. In both instances I spoke with a gentleman named William at ASRock who was very professional, knowlegable and helpful.

I've been contemplating buying two more of these. One to build as an off-site backup; a replacement for an aging, former NAS box which is currently doing that duty and the other to have on hand if or when either fails. I've read conflicting reports regarding the cause of the failures in these boards. I've read that the reliability issue has been resolved and hope that's true because this board is so very close to being great. I don't know of another board with this feature set in this form factor. There are some that are close but 3 times the price.

PSU SEASONIC X650 GOLD SS-650KM

The price for this was $109.99 plus shipping. It's good. It's quiet. I like it.

Lian Li Case PC-Q35A Black Aluminum Mini-ITX Tower Case

What arrived was a silver or natural aluminum case not black in color but I liked it just as well so I kept it without complaint. It's a nice case.

Intel 320 Series 40 GB 2.5" SATA SSD

Two of these are partioned and mirrored to serve dual roles as boot / OS drives and ZIL(ZFS Intent Log) which is basically a write cache for metadata to faster media. They are working as intended.

ICY DOCK MB155SP-B FatCage 5x3.5

The spinning media lives here. I thought it would be nice to remove and replace drives without opening the case but haven't had occasion to do that. It has a fan to cool the drives. It also allows for powering five drives from three 15 pin SATA power connectors.

Intel Pro 2500 SSDSC2BF120H501 2.5" 120GB SSD

This drive serves as a read cache between main memory and the disks and performs as expected.

ICY DOCK MB153SP-B 3 in 2 SATA Internal Backplane

All three SSD's live here. They are powered by two 15 pin SATA connectors. That puts all eight drives in the five 5.25" bays provided by the Lian Li case.


support

Part 2 FreeBSD and ZFS setup and install

Much of what follows is adapted for this home server from documentation provided about the Hyades Supercomputer dedicated to Computational Astrophysics research at the University of California, Santa Cruz (UCSC).

FreeBSD-10.1 was originally installed but I've since upgraded to FreeBSD-11. I'll try to catch any changes required in this recipe to start from FreeBSD-11. There should be few if any deviations.

To create a bootable flash drive from a FreeBSD system download the the appropriate image then copy that to the flash drive like this.


$ dd if=/dev/zero of=/dev/da0 bs=64k count=10
$ dd if=FreeBSD-10.1-RELEASE-amd64-memstick.img of=/dev/da0 bs=64k

Boot from the flash drive.

Do the initial bsdinstall setup. There will be dialog boxes for setting time, how to connect your network, giving a hostname, and creating user accounts and passwords. When you get to the partitoning tool choose the Shell option.

Obtain a list of the attached storage devices.

The output below is current. The Marvel Console is a raid configuration utility. There is a tool available from ASRock to turn it off. It isn't needed for just a bunch of disks. I did that before but not after replacing the motherboard in May. I've noticed no difference except the console appears in the device list.


# camcontrol devlist
<WDC WD60EZRX-00MVLB1 80.00A80> at scbus0 target 0 lun 0 (pass0,ada0)
<WDC WD60EZRX-00MVLB1 80.00A80> at scbus1 target 0 lun 0 (pass1,ada1)
<WDC WD60EZRX-00MVLB1 80.00A80> at scbus2 target 0 lun 0 (pass2,ada2)
<WDC WD60EZRX-00MVLB1 80.00A80> at scbus3 target 0 lun 0 (pass3,ada3)
<WDC WD60EZRX-00MVLB1 80.00A80> at scbus4 target 0 lun 0 (pass4,ada4)
<INTEL SSDSC2BF120A5 TG20> at scbus5 target 0 lun 0 (pass5,ada5)
<Marvell Console 1.01> at scbus9 target 0 lun 0 (pass6)
<INTEL SSDSA2CT040G3 4PC10362> at scbus14 target 0 lun 0 (pass7,ada6)
<INTEL SSDSA2CT040G3 4PC10362> at scbus15 target 0 lun 0 (pass8,ada7)

If you had drives on a different interface, for example USB, you'd see them listed as da0, da1, etc., instead of ada0, ada1, and so on with the SATA drives.

4k alignment

Set a sysctl variable forcing ZFS to choose 4k disk blocks. Some drives use 4k sectors but lie by reporting 512k to allow interoperability with 32 bit Windows. This step may increase the speed of the array by 5% to 20% at the cost of wasted space. If you have a huge number of small ( <4kb ) files this could be a bad idea.


# sysctl vfs.zfs.min_auto_ashift=12

Prepare the drives.

Remove any extant partitioning.


# gpart destroy -F ada0
# gpart destroy -F ada1
# gpart destroy -F ada2
# gpart destroy -F ada3
# gpart destroy -F ada4
# gpart destroy -F ada5
# gpart destroy -F ada6
# gpart destroy -F ada7

Assign to the drives a GUID Partition Table scheme. If you have some odd hardware or are a masochist you could choose instead BSD disklabel, MBR or one of several other schemes.


# gpart create -s gpt ada0
# gpart create -s gpt ada1
# gpart create -s gpt ada2
# gpart create -s gpt ada3
# gpart create -s gpt ada4
# gpart create -s gpt ada5
# gpart create -s gpt ada6
# gpart create -s gpt ada7

Create the partitions.
This step also defines the block size, the partion type as appropriate for the given scheme and lets us assign convenient labels to the partitions.


# gpart add -a 4k -t freebsd-zfs -l hdd0 ada0
# gpart add -a 4k -t freebsd-zfs -l hdd1 ada1
# gpart add -a 4k -t freebsd-zfs -l hdd2 ada2
# gpart add -a 4k -t freebsd-zfs -l hdd3 ada3
# gpart add -a 4k -t freebsd-zfs -l hdd4 ada4
# gpart add -t freebsd-zfs -l cache0 ada5 

The cache drive (natively 512kb) could not be 4k aligned due to an error resolved by a later update. The work-around at the time was to revert the modified sysctl variable and make another sysctl variable change before adding the drive, then subsequent restoration of the variables.


# sysctl vfs.zfs.min_auto_ashift=9
# sysctl vfs.zfs.max_auto_ashift=9
# gpart add -t freebsd-zfs -l cache0 ada5
# sysctl vfs.zfs.min_auto_ashift=12
# sysctl vfs.zfs.max_auto_ashift=12

I wanted ZIL redundancy, so I set up a pair of drives to be mirrored. The operating system will be mirrored also. The boot sectors and swap space are duplicated. If one of these fails the other will, in theory, be available to boot and the pool will operate normally until the drive can be replaced. I bought 4 of the Intel 320 Series drives at a discounted price. The two extra drives remain unused.


# gpart add -s 222 -a 4k -t freebsd-boot -l boot0 ada6
# gpart add -s 1g -a 4k -t freebsd-swap -l swap0 ada6
# gpart add -s 16g -a 4k -t freebsd-zfs -l ssd0 ada6
# gpart add -s 16g -a 4k -t freebsd-zfs -l log0 ada6
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada6
# gpart add -s 222 -a 4k -t freebsd-boot -l boot1 ada7
# gpart add -s 1g -a 4k -t freebsd-swap -l swap1 ada7
# gpart add -s 16g -a 4k -t freebsd-zfs -l ssd1 ada7
# gpart add -s 16g -a 4k -t freebsd-zfs -l log1 ada7
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada7

The 222 block freebsd-boot sector figure was I think copied from the guide I was following at the time. It's fine but I've been trying to recall the logic behind that number and can't. With this drive that number is about 113k. The bootcode for FreeBSD-10.1 was <64k. I updated the bootcode during the update to FreeBSD-11. It is now 87k. There was and may still be a 512k limitation imposed by the loader on the size of the bootsector. I think next time I'll use it all for future-proofing.

Load the ZFS kernel module.

# kldload zfs

Create the mirrored boot pool named zroot.

# zpool create -o altroot=/mnt -O canmount=off -m none zroot mirror /dev/gpt/ssd0 /dev/gpt/ssd1

Create the FreeBSD file system hierarchy

# zfs set checksum=fletcher4 zroot
# zfs set atime=off zroot
# zfs create -o mountpoint=none zroot/ROOT
# zfs create -o mountpoint=/ zroot/ROOT/default
# zfs create -o mountpoint=/home -o setuid=off zroot/home
# zfs create -o mountpoint=/tmp -o compression=lz4 -o setuid=off zroot/tmp
# chmod 1777 /mnt/tmp
# zfs create -o mountpoint=/usr zroot/usr
# zfs create zroot/usr/local
# zfs create zroot/usr/obj
# zfs create -o compression=lz4 -o setuid=off   zroot/usr/ports
# zfs create -o compression=off -o exec=off -o setuid=off zroot/usr/ports/distfiles
# zfs create -o compression=off -o exec=off -o setuid=off zroot/usr/ports/packages
# zfs create -o compression=lz4 -o exec=off -o setuid=off zroot/usr/src
# zfs create -o mountpoint=/var zroot/var
# zfs create -o compression=lz4 -o exec=off -o setuid=off zroot/var/crash
# zfs create -o exec=off -o setuid=off zroot/var/db
# zfs create -o compression=lz4 -o exec=on -o setuid=off zroot/var/db/pkg
# zfs create -o exec=off -o setuid=off zroot/var/empty
# zfs set readonly=on zroot/var/empty
# zfs create -o compression=lz4 -o exec=off -o setuid=off zroot/var/log
# zfs create -o compression=gzip -o exec=off -o setuid=off zroot/var/mail
# zfs create -o exec=off -o setuid=off zroot/var/run
# zfs create -o compression=lz4 -o exec=on -o setuid=off zroot/var/tmp
# chmod 1777 /mnt/var/tmp

Set the dataset from which to boot.

# zpool set bootfs=zroot/ROOT/default zroot

Add the swap devices to fstab.

# cat << EOF > /tmp/bsdinstall_etc/fstab
#Device        Mountpoint  FStype  Options  Dump  Pass#
/dev/gpt/swap0  none        swap    sw       0     0
/dev/gpt/swap1  none        swap    sw       0     0
EOF

Type exit in the shell and proceed with the installation.

When the installation is complete, choose Exit from the main menu. The next dialog will offer the option to 'open a shell in the new system to make any final manual modifications'. Select Yes.

Configure ZFS.

# mount -t devfs devfs /dev
# echo 'zfs_enable="YES"' >> /etc/rc.conf
# echo 'zfs_load="YES"' >> /boot/loader.conf
# echo 'vfs.zfs.vdev.cache.size="32M"' >> /boot/loader.conf
# echo "vfs.zfs.min_auto_ashift=12" >> /etc/sysctl.conf
# echo "vfs.zfs.max_auto_ashift=12" >> /etc/sysctl.conf
# zfs set readonly=on zroot/var/empty

Exit the shell, remove the flash drive and reboot.

Define the mass storage array

Create a zpool over the five 6TB drives and give it a file system. Here I named it simply nas and chose a raidz1 configuration.


# zpool create -m none nas raidz1 \
/dev/gpt/hdd0 /dev/gpt/hdd1 /dev/gpt/hdd2 /dev/gpt/hdd3 /dev/gpt/hdd4 \
cache /dev/gpt/cache0 log mirror /dev/gpt/log0 /dev/gpt/log1
# zfs set checksum=fletcher4 nas
# zfs set atime=off nas
# zfs create -o mountpoint=/export/nas -o setuid=off nas
# chmod 1777 /export/nas

It works!

$ zpool status
  pool: nas
 state: ONLINE
  scan: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        nas           ONLINE       0     0     0
          raidz1-0    ONLINE       0     0     0
            gpt/hdd0  ONLINE       0     0     0
            gpt/hdd1  ONLINE       0     0     0
            gpt/hdd2  ONLINE       0     0     0
            gpt/hdd3  ONLINE       0     0     0
            gpt/hdd4  ONLINE       0     0     0
        logs
          mirror-1    ONLINE       0     0     0
            gpt/log0  ONLINE       0     0     0
            gpt/log1  ONLINE       0     0     0
        cache
          gpt/cache0  ONLINE       0     0     0

errors: No known data errors

  pool: zroot
 state: ONLINE
  scan: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        zroot         ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            gpt/ssd0  ONLINE       0     0     0
            gpt/ssd1  ONLINE       0     0     0

errors: No known data errors


$ zpool iostat nas 5

               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
nas          291G  27.0T      0    136      0  11.6M

A very minor fix.

After installation of the FreeBSD-10.3 upgrade on April 29, 2016 I noticed the cache lost its gpt label. Deletion and recreation of the partition fixed it. I attempted to format it 4k again, even though it's not a 4k drive. It worked this time. The option to offset the starting block 2048 bytes I read about somewhere. I think it relates in that those 2048 bytes correspond to four 512kb blocks that remain after 4k formatting. If I recall correctly the reason for those to be set at the beginning rather than the end of the space is that GPT labels use the final sectors of each partition. Perhaps this assures the labels aren't written outside of defined block space. Surely past self didn't copy the value but did the math. Regardless of my poor memory here it does work.


#zpool remove nas gpt/cache0 

or maybe it was


#zpool remove nas diskid/DISK-WHATEVERTHEDISKIDWAS

then


#gpart delete -i 1 ada5
#gpart add -b 2048 -a 4k -t freebsd-zfs -l cache ada5
#zpool add nas cache gpt/cache

Notes about uptime and performance.

Excepting the issue with the motherboard, which upon further reading I believe is caused by a flaw in the C2000 series processor and not the motherboard design, this setup has been remarkably stable. It lives behind an APC BackUPS 1500 and I have a small Honda generator for the occasions power outages last longer than a few minutes. There were 376 days of uninterrupted operation between the reboot during the FreeBSD-10.3 update and the motherboard / processor failure in May. Today, Thursday August 24, 2017 uptime is 90 days.

I'm happy with the perfomance. The heaviest workload for this server was the initial zfs send/recieve operation which copied to it approximately 8TB of data. Right now it is serving up a 1080p video, receiving bittorrent traffic via NFS and receiving files copied from a lan connected computer over SSH via the scp command. Here are the current iostats. The first output line is an average over time and the 2nd line is the operations and bandwidth during the previous one second interval.


$ zpool iostat nas 1 2
               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
nas         12.4T  14.8T     13     22  1.54M   327K
nas         12.4T  14.8T    105    759  13.1M  43.0M

There are a couple Windows machines used for watching video in other rooms via Samba over wifi. They can be and often are going simultaneously. None of this is processor intensive so I also use the box for some transcoding.

Here's the latest zpool status showing the results of a recent scrub. There haven't been any read, write, or checksum errors with this build.


$ zpool status      
  pool: nas         
 state: ONLINE      
  scan: scrub repaired 0 in 10h30m with 0 errors on Wed Aug 23 20:56:12 2017    
config:             

        NAME          STATE     READ WRITE CKSUM                                
        nas           ONLINE       0     0     0                                
          raidz1-0    ONLINE       0     0     0                                
            gpt/hdd0  ONLINE       0     0     0                                
            gpt/hdd1  ONLINE       0     0     0                                
            gpt/hdd2  ONLINE       0     0     0                                
            gpt/hdd3  ONLINE       0     0     0                                
            gpt/hdd4  ONLINE       0     0     0                                
        logs        
          mirror-1    ONLINE       0     0     0                                
            gpt/log0  ONLINE       0     0     0                                
            gpt/log1  ONLINE       0     0     0                                
        cache       
          gpt/cache   ONLINE       0     0     0                                

errors: No known data errors            

  pool: zroot       
 state: ONLINE      
  scan: scrub repaired 0 in 0h1m with 0 errors on Wed Aug 23 10:24:40 2017      
config:             

        NAME          STATE     READ WRITE CKSUM                                
        zroot         ONLINE       0     0     0                                
          mirror-0    ONLINE       0     0     0                                
            gpt/ssd0  ONLINE       0     0     0                                
            gpt/ssd1  ONLINE       0     0     0                                

errors: No known data errors            


support

Part 3. Networking, NFS, Snapshots, SMART monitoring, Samba and other configuration.

Much of this section takes advantage of instruction provided by the invaluable FreeBSD Handbook with the exception of the NFSv4 configuration that uses the sharenfs feature of ZFS and is adapted from the configuration of the Hyades Supercomputer dedicated to Computational Astrophysics research at the University of California, Santa Cruz (UCSC).

Verify the hostname, network settings and services enabled by bsdinstall.

An example is given here for the hostname and a commonly used private IP address. The interface name igb0 identifies the ethernet interface. That is igb is the name of the FreeBSD driver for the Intel(R) Gigabit Ethernet adapter used on this motherboard and it's suffixed with the interface number. Unbound is a local caching DNS resolver. SSH, and NTP are enabled. Crash dumps are enabled. The line which enables zfs on startup was added in Part 2 and is present also.


# cat /etc/rc.conf
hostname="yourhostname.example.com"
defaultrouter="192.168.0.1" 
ifconfig_igb0="DHCP"
ifconfig_igb0_ipv6="inet6 accept_rtadv"
local_unbound_enable="YES"
sshd_enable="YES"
ntpd_enable="YES"
# Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
dumpdev="AUTO"
zfs_enable="YES"

Configure and start the NFS server.

The rpcbind utility is a server that converts RPC program numbers into universal addresses and is required by NFS. The optional rpc.lockd provides file and record locking services in the NFS environment and rpc.statd cooperates with rpc.statd daemons on other hosts to provide status monitoring. While rpc.lockd and rpc.statd are optional some applications require file locking to operate correctly. This machine will serve NFS over both UDP and TCP transports using 32 daemons.


# echo 'rpcbind_enable="YES"' >> /etc/rc.conf
# echo 'rpc_lockd_enable="YES"' >> /etc/rc.conf
# echo 'rpc_statd_enable="YES"' >> /etc/rc.conf
# echo 'nfs_server_enable="YES"' >> /etc/rc.conf
# echo 'nfs_server_flags="-u -t -n 32"' >> /etc/rc.conf
# service nfsd start

Export the ZFS filesystem to the private subnet.


# zfs set sharenfs="-maproot=root -network=192.168.0.0/24" nas

The share is exported to the private subnet with no_root_squash (-maproot=root). One might prefer to squash root privileges on the share in which case you could map root to another user or nobody (-maproot=nobody).

The export list is copied to /etc/zfs/exports rather than the standard location in FreeBSD at /etc/exports. The share is instantaneously exported.


# cat /etc/exports
cat: /etc/exports: No such file or directory


# cat /etc/zfs/exports
# !!! DO NOT EDIT THIS FILE MANUALLY !!!

/nas	-maproot=root -network=192.168.0.0/24


# showmount -e
Exports list on localhost:
/nas                               192.168.0.0

Setup snapshots.

# cd /usr/ports/sysutils/zfstools/
# make install clean

The final output of the installation will read as follows.

Installing zfstools-0.3.6_1...
To enable automatic snapshots, place lines such as these into /etc/crontab:

    PATH=/etc:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin
    15,30,45 * * * * root /usr/local/sbin/zfs-auto-snapshot frequent  4
    0        * * * * root /usr/local/sbin/zfs-auto-snapshot hourly   24
    7        0 * * * root /usr/local/sbin/zfs-auto-snapshot daily     7
    14       0 * * 7 root /usr/local/sbin/zfs-auto-snapshot weekly    4
    28       0 1 * * root /usr/local/sbin/zfs-auto-snapshot monthly  12

This will keep 4 15-minutely snapshots, 24 hourly snapshots, 7 daily snapshots,
4 weekly snapshots and 12 monthly snapshots. Any resulting zero-sized snapshots
will be automatically cleaned up.

Enable snapshotting on a dataset or top-level pool with:

    zfs set com.sun:auto-snapshot=true DATASET

Children datasets can be disabled for snapshot with:

    zfs set com.sun:auto-snapshot=false DATASET

Or for specific intervals:

    zfs set com.sun:auto-snapshot:frequent=false DATASET

See website and command usage output for further details.

===>  Cleaning for zfstools-0.3.6_1

A check of zfs properties and a look into /etc/crontab will show those instructions were followed.


# zfs get com.sun:auto-snapshot nas
NAME  PROPERTY               VALUE                  SOURCE
nas   com.sun:auto-snapshot  true                   local

# cat /etc/crontab
# /etc/crontab - root's crontab for FreeBSD
#
# $FreeBSD: releng/11.0/etc/crontab 194170 2015-03-15 13:06:12Z ayylmao $
#
SHELL=/bin/sh
    PATH=/etc:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin
    15,30,45 * * * * root /usr/local/sbin/zfs-auto-snapshot frequent  4
    0        * * * * root /usr/local/sbin/zfs-auto-snapshot hourly   24
    7        0 * * * root /usr/local/sbin/zfs-auto-snapshot daily     7
    14       0 * * 7 root /usr/local/sbin/zfs-auto-snapshot weekly    4
    28       0 1 * * root /usr/local/sbin/zfs-auto-snapshot monthly  12

I've rolled back to a snapshot exactly once. Earlier this year I deleted the contents of a directory in error. I rolled back to the previous frequent snapshot and it just worked. I wondered what would happen had there been a transcoding session in progress. I assume any jobs writing to the array which are in progress during a rollback would need to be reinitiated. There were none at that time and the mistakenly deleted files were easily recovered.

This attempt to find where the snapshots were stored provided little information.


# zfs get snapdir nas
NAME  PROPERTY  VALUE    SOURCE
nas   snapdir   hidden   default

I referred to the FreeBSD Handbook and found 'Snapshots are not shown by a normal zfs list operation. To list snapshots, -t snapshot is appended to zfs list'. Let's try it.

# zfs list -t snapshot
NAME                                          USED  AVAIL  REFER  MOUNTPOINT
nas@20150324                                  128K      -   153K  -
nas@zfs-auto-snap_monthly-2016-09-01-00h28   9.64G      -  8.44T  -
nas@zfs-auto-snap_monthly-2016-10-01-00h28   6.18G      -  8.50T  -
nas@zfs-auto-snap_monthly-2016-11-01-00h28   2.69G      -  8.64T  -
nas@zfs-auto-snap_monthly-2016-12-01-00h28   9.70G      -  8.74T  -
nas@zfs-auto-snap_monthly-2017-01-01-00h28   15.0M      -  8.79T  -
nas@zfs-auto-snap_monthly-2017-02-01-00h28   15.3M      -  8.86T  -
nas@zfs-auto-snap_monthly-2017-03-01-00h28   1022M      -  8.94T  -
nas@zfs-auto-snap_monthly-2017-04-01-00h28   16.9G      -  9.01T  -
nas@zfs-auto-snap_monthly-2017-05-01-00h28   3.43G      -  9.04T  -
nas@zfs-auto-snap_monthly-2017-06-01-00h28   9.77G      -  9.07T  -
nas@zfs-auto-snap_monthly-2017-07-01-00h28   63.2G      -  9.25T  -
nas@zfs-auto-snap_weekly-2017-07-30-00h14    4.90G      -  9.19T  -
nas@zfs-auto-snap_monthly-2017-08-01-00h28   22.7G      -  9.27T  -
nas@zfs-auto-snap_weekly-2017-08-06-00h14    30.5G      -  9.42T  -
nas@zfs-auto-snap_weekly-2017-08-13-00h14    4.94G      -  9.38T  -
nas@zfs-auto-snap_daily-2017-08-20-00h07      428K      -  9.31T  -
nas@zfs-auto-snap_weekly-2017-08-20-00h14     377K      -  9.31T  -
nas@zfs-auto-snap_daily-2017-08-21-00h07      479K      -  9.32T  -
nas@zfs-auto-snap_daily-2017-08-22-00h07      639K      -  9.33T  -
nas@zfs-auto-snap_daily-2017-08-23-00h07      230K      -  9.31T  -
nas@zfs-auto-snap_daily-2017-08-24-00h07      153K      -  9.31T  -
nas@zfs-auto-snap_hourly-2017-08-24-14h00    8.27M      -  9.31T  -
nas@zfs-auto-snap_hourly-2017-08-24-15h00    10.9M      -  9.31T  -
nas@zfs-auto-snap_hourly-2017-08-24-16h00    16.5M      -  9.31T  -
nas@zfs-auto-snap_hourly-2017-08-24-17h00    1.65M      -  9.31T  -
nas@zfs-auto-snap_hourly-2017-08-24-18h00    1.65M      -  9.32T  -
nas@zfs-auto-snap_hourly-2017-08-24-19h00    1.65M      -  9.32T  -
nas@zfs-auto-snap_hourly-2017-08-24-20h00    1.65M      -  9.32T  -
nas@zfs-auto-snap_hourly-2017-08-24-21h00    1.65M      -  9.32T  -
nas@zfs-auto-snap_hourly-2017-08-24-22h00    1.65M      -  9.32T  -
nas@zfs-auto-snap_hourly-2017-08-24-23h00    1.65M      -  9.32T  -
nas@zfs-auto-snap_hourly-2017-08-25-00h00    1.65M      -  9.32T  -
nas@zfs-auto-snap_daily-2017-08-25-00h07     1.65M      -  9.32T  -
nas@zfs-auto-snap_hourly-2017-08-25-01h00    1.65M      -  9.32T  -
nas@zfs-auto-snap_hourly-2017-08-25-02h00    1.65M      -  9.32T  -
nas@zfs-auto-snap_hourly-2017-08-25-03h00    1.65M      -  9.32T  -
nas@zfs-auto-snap_hourly-2017-08-25-04h00    1.65M      -  9.32T  -
nas@zfs-auto-snap_hourly-2017-08-25-05h00    1.65M      -  9.32T  -
nas@zfs-auto-snap_hourly-2017-08-25-06h00    1.65M      -  9.32T  -
nas@zfs-auto-snap_hourly-2017-08-25-07h00    1.65M      -  9.32T  -
nas@zfs-auto-snap_hourly-2017-08-25-08h00    1.65M      -  9.32T  -
nas@zfs-auto-snap_hourly-2017-08-25-09h00    1.65M      -  9.32T  -
nas@zfs-auto-snap_hourly-2017-08-25-10h00    1.65M      -  9.32T  -
nas@zfs-auto-snap_frequent-2017-08-25-10h15  1.65M      -  9.32T  -
nas@zfs-auto-snap_frequent-2017-08-25-10h30  1.52M      -  9.32T  -
nas@zfs-auto-snap_frequent-2017-08-25-10h45  1.52M      -  9.32T  -
nas@zfs-auto-snap_hourly-2017-08-25-16h00     115K      -  9.32T  -
nas@zfs-auto-snap_hourly-2017-08-25-20h00     115K      -  9.32T  -
nas@zfs-auto-snap_daily-2017-08-26-00h07         0      -  9.31T  -
nas@zfs-auto-snap_hourly-2017-08-26-14h00        0      -  9.31T  -
nas@zfs-auto-snap_frequent-2017-08-26-14h15      0      -  9.31T  -
zroot@first                                      0      -    96K  -

Then it was just a matter of choosing the snapshot I wanted and issued the following command. We'll pretend the dates are back then instead of now.


# zfs rollback nas@zfs-auto-snap_frequent-2017-08-26-14h15

Not bad, not bad at all.

S.M.A.R.T. monitoring setup

I installed the smartmontools package which contains two utility programs (smartctl and smartd) to control and monitor storage systems using the Self-Monitoring, Analysis and Reporting Technology System (S.M.A.R.T.).


# cd /usr/ports/sysutils/smartmontools/pkg-descr
# make install clean
# echo 'smartd_enable="YES"' >> /etc/rc.conf
# echo 'daily_status_smart_devices="/dev/ada0 /dev/ada1 /dev/ada2 /dev/ada3 /dev/ada4 /dev/ada5 /dev/ada6 \
 /dev/ada7"' >> /etc/periodic.conf

Google published some research on drive failures which isn't comforting with regard to SMART monitoring being predictive. The authors state "We find, for example, that after their first scan error, drives are 39 times more likely to fail within 60 days than drives with no such errors. First errors in reallocations, offline reallocations, and probational counts are also strongly correlated to higher failure probabilities. Despite those strong correlations, we find that failure prediction models based on S.M.A.R.T. parameters alone are likely to be severely limited in their prediction accuracy, given that a large fraction of our failed drives have shown no S.M.A.R.T. error signals whatsoever." So while it sounds better than panacea, it's not great. I look at the daily status reports. If I see anything negative there I'm going to swap out a drive, if I get so lucky. Still with raidz1 the array can lose one drive without data loss and if the worst happens and a second drive fails before the first is replaced then I recourse to offsite backup. There is now more data on this array than the offsite backup can contain, but everything important is backed up. Leaving only easily replaced files at risk for loss should two or more drives fail at nearly the same time.

Samba.

I thought Samba was running on the server but isn't. Samba serves to Windows computers from one of the NFS clients instead.

A few other things

The server is normally headless with no monitor or keyboard attached. The tmux terminal multiplexor is installed and ffmpeg for trancoding video. I often ssh in to set transcoding tasks which run in tmux sessions. This doesn't seem to interfere with NFS.

That's all folks
Well, I spent longer writing this up than expected. It's easier reading than my notes for sure. I mentioned that I'm pretty happy with this build overall. If the processor / motherboard issue is resolved this setup should be good for a few more years. Maybe the price will come down on those helium filled 12TB drives I've been reading about or better yet high capacity SSD's please.

You may wonder what in the world anyone would want with so much storage. The answer is mostly for linux distributions, but there's much other stuff. If I want to break a video into stills for editing and reassembly, no problem. I have before and may again build a search engine. Mine wasn't Google sized, but it was fast and worked well. I recall bumping into some file system limitations before switching to PostgreSQL. Those just don't exist with ZFS. Permutations are fun!

You may also wonder why I didn't use a ready-made software solution like FreeNAS or NAS4Free. I just never have is all. I don't know if they're better or worse than my setup. From what I've read those seem like a fine way to go about this.


support