Author Topic: RAID  (Read 3465 times)

urban420

  • Newbie
  • *
  • Posts: 11
    • View Profile
RAID
« on: January 31, 2014, 11:58:49 AM »
Hello -

I just purchased your book and while i have only read the first few chapters thus far the information has been helpful already. But I had a question on the matter of whether or not to RAID a Mac Mini Server.

I am installing a Mac Mini Server at our office to replace a Windows server. When I decided to go the route of an Apple server the one thing that bothered me was the ability to run a RAID set up on the server. Coming from the world of Windows servers I am used to RAID controllers and multiple hot swap hard drives.

Over the past few months I have spoken with numerous people who all seem to be against running RAID on the Mini. Their solution is always to clone or Time Machine from one drive to the other and to me this leaves the possibility for lost data and does not really provide for redundancy. I think a lot of people shy away from RAID because they don't fully understand it, and they view it as a backup when in fact it is about redundancy and the backup is completely separate.

So on to my question, initially I wanted to RAID the Mac Mini Server but as I said above after I spoke to several people and researched on the web I was left with the impression that Apple discouraged RAID mirroring. It seems like when you RAID the volumes that you will lose some features that may be important. In fact in the book it briefly describes the fact that with RAID you lose the ability to have a Recovery volume and to use FileVault.

So my question is, what exactly do you lose when you set up RAID? And what does the loss of these features really mean?  I see there is an Apple support article that says:

"Recovery offers on-disk recovery tools, allows you to restore from Time Machine backups, reinstall OS X via the Internet, or set a firmware password."

Is there any way to prepare for the loss of these features? For example the Apple support article on RAID says:

"you should consider using the Recovery Disk Assistant to create an external recovery disk before creating your RAID volume."

Is this step something you recommend?

Over the years my experience has always been to try to adapt to the way Apple designed something rather that trying to adapt the hardware/software to the way I have always done it in the past. Obviously this is not the case in every situation, and sometimes Apple designs something to be easy so that just about anyone can manage it.

I'm not sure how much others would benefit from some clarification on these topics, and maybe the short blurb in the book really is enough information. I just know that having researched it previously it seemed like there was a lot more to it.

Overall I can't say how much i appreciate your book. For someone like myself who has always dealt with Windows servers but have been using Mac notebooks and workstations for years, the book has thus far been a wealth of information. I am really looking forward to future books and can't wait for the next release.

Thanks -

J
« Last Edit: January 31, 2014, 12:01:47 PM by urban420 »

Reid Bundonis

  • Administrator
  • Full Member
  • *****
  • Posts: 107
    • View Profile
Re: RAID
« Reply #1 on: January 31, 2014, 05:58:29 PM »
Thank you so much for the kind words and I am glad the book is helping you out.  That was my goal and every time I hear good feedback in drives me harder toward finishing the second book!

So, the question to RAID or not to RAID.  This is a tough call and has been even in the days of the Xserve (three drives without ordering the hardware raid card, um, hmmm).  In most cases, the right answer is what you feel comfortable with.  But, to expand on my approach, I always create RAIDs for servers since the only piece of redundancy in the mini is the drives.  Beyond that, I buy two minis.  Redundancy via duplication of hardware.

First, to RAID.

If I look at the traditional deployment model of a mini server, it is placed in a rack, closet, or on a desk and has no keyboard, mouse, or display connected.  Due to the headless nature, I interact with it over ARD and often just leave the system at its login window.  A sign of a good server is one that the admin's don't admin.  Users use it and it just works.  So if something is going wrong, you are likely removing it from its home and then connecting keyboard, mouse, and monitor to triage the unit.  If this is the case, you likely have a service drive or tools available to service the machine.

Drive choice may also drive the decision to RAID.  In most of our deployments, I will still spec the dual 1 TB internal drives.  I will do this with the intention of mirroring the drives and then storing all data on an external volume (mini + Pegasus has become a very popular configuration).  In this case, being a server, the boot volume doesn't need to be super speedy and in many cases we never exceed a few % capacity.  But, I am protected against a drive failure.  And it does occur.  (just before the holidays we had a rash of 2011 mini Servers have drive failures - without mirroring, we would have had a lot of angry customers) 

I am grounded in the reality that RAID is not backup.  So in addition to the RAID, we also ensure that a sound backup policy is in place.  While the dual drive config can protect against the loss of a drive, backup can keep your business operating.  I still do not rely solely on TimeMachine, preferring to script backup of vital services like Open Directory.

Now, the option to not mirror.  mini Servers come with dual drives as the default config.  It is clearly an option to put the OS on one drive and data on the other.  But now you have NO redundancy and the loss of a drive is either your corporate data or your OS.  Neither options are pleasant. 

Next, there is the cloning process.  rsync, Carbon Copy Cloner, and other tools can do a reasonably good job at doing this.  But, there is the database conundrum.  OS X Server will run a number of its services off of databases (LDAP, Kerberos, etc).  If the database is open, how can the clone work properly?  And the worst time to discover that your synchronization did not work is when your production system failed and you are relying on that sync.

And a note about what you gain and lose.  Here is my perspective.
Gain:
• Hardware redundancy where you need it most... your data (or OS but it commonly contains data like users and passwords)
• Performance gain for reading (marginal)

Loss:
• Capacity (a mirror is using two drives to get the capacity of one)
• Recovery partition (this is not a concern.  Command R will boot your over the internet so having a local recovery partition on a server is not a concern (even on clients it can be a security nightmare).
• Filevault can not be used as there is no recovery partition to store the keys.  Once again, encrypting the drives of a server is likely not a good workflow.  If the server ever reboots (service, power loss, etc), you will need to physically be present for it to actually boot.

Once again, of me being a consultant, I am more interested in devices that can sustain an existence without my constant presence.  In some cases, we deploy systems and then don't see them again for months (or even years).  I want to make sure that the customer has a chance of running their server as reliably and consistently as possible.  For me, RAIDing the boot and storing data on another RAIDed array allows me to sleep at night.

Also, lately, I've been buying 64 GB SD cards and installing or cloning the OS to the card.  I will leave the card in the unit and should I need to boot to another drive, I simply select the SD card and up I come.

Let me know if that helps.  While I am in the camp of RAID, I can see the argument against.  If I was controlling my own site and was there to watch it everyday, maybe I would risk running without the redundancy.  But the reliability and stability of OS X tends to make me fire and forget.  For that, I need hardware features that protect against the most common point of failure.



urban420

  • Newbie
  • *
  • Posts: 11
    • View Profile
Re: RAID
« Reply #2 on: February 02, 2014, 12:20:51 AM »
Thanks taking the time to reply with such a detailed response. What you said just solidifies what I was thinking. Working with Windows servers it's never been whether to RAID or not, but rather what level of RAID and what controller.

It is funny, much of what you said in the beginning of your book is so true. When I started to work on this project I was just floored by the lack of being able to deploy what I viewed as a real Apple server. I never worked with any of the previous server hardware from Apple so I was introduced straight to the Mini Server. I was like "yea right" and started to look for a refurbed Mac Pro Tower or some other solution because I just did not believe it would be possible with a Mini Server. But the more I researched the more I realized the Mini Server can be a good solution for many applications.

And your suggestion to have a second Mini Server for redundancy is actually something I was planning. I actually think I saw that there is some sort of rack mount solution for two Mini Severs, so it is obvious others are utilizing this setup. For the price of the Mini you really can't beat the option of having a second one on hand if something goes wrong so you can send the Mini in for service without being out of service.

Your info on the features you lose when you set up the drives in the Mac Mini Server is exactly what I was looking for. I've been a Mac user for years but I just could not figure out what the ramifications of not being able to have a recovery partition would be down the road. I think I was just trying to over-complicate things because like you pointed out you can easily boot over the web.

One question, Apple talks about installing recovery partition on an external drive, but you have to install it from a system with an existing Recovery System. You mentioned using an SD card and installing/cloning the OS to it to boot from if you ever needed. Is this essentially the same concept, but a step further? And would you create this SD card before you set up the drives to mirror (before the loss of recovery partition) or would you do it after? Or does it not really matter?

I am totally with you as far as RAID on a server and to me redundancy is everything. Drives fail, sometimes we are lucky and other times we are not, so being able to reduce downtime is of utmost importance. Now if only monitoring the drives were easier!

It is great that you have put your real world experiences in your book so that others can learn. It seems like there are a lot of people know how things should work or can work, but not as many understand how the actually do work.

Thanks again for your help.

Reid Bundonis

  • Administrator
  • Full Member
  • *****
  • Posts: 107
    • View Profile
Re: RAID
« Reply #3 on: February 02, 2014, 08:15:03 AM »
One question, Apple talks about installing recovery partition on an external drive, but you have to install it from a system with an existing Recovery System. You mentioned using an SD card and installing/cloning the OS to it to boot from if you ever needed. Is this essentially the same concept, but a step further? And would you create this SD card before you set up the drives to mirror (before the loss of recovery partition) or would you do it after? Or does it not really matter?

It does not really matter.  You can skin this cat in about 20 ways.  For example, you can clone the recovery partition to another volume before creating the RAID.  You can snag it from another machine.  (diskutil list will get you started with seeing it).  I tend to like having a full bootable OS.  While the recovery partition gets more useful with each release, having a full OS with diagnostic tools and a full Finder I find to be invaluable.  Bottom line is that you should have a service disk somewhere in your inventory.  I have a Firewire/USB drive with 5 partitions.  I can boot to 10.6.8, 10.6 Server install, 10.7.x, 10.8.x, and 10.9.x all from this one disk.  The 10.7, 10.8, and 10.9 partitions have the OS installer on them so I can reinstall if needed.  It allows me to use one drive to boot to every possible device I may come across.

I am totally with you as far as RAID on a server and to me redundancy is everything. Drives fail, sometimes we are lucky and other times we are not, so being able to reduce downtime is of utmost importance. Now if only monitoring the drives were easier!

Ah, memories of Server Monitor...  I've used a simple script to watch RAID volumes for a while now.  Without hardware monitoring, it can at least notify me of an issue.  Here is an example of a failed drive:

Name:                 boot
Unique ID:            7DDF9AEE-9C94-42AC-B358-FE45A7CD1731
Type:                 Mirror
Status:               Degraded
Size:                 999860895744 B
Device Node:          disk8
Apple RAID Version:   2
-------------------------------------------------------------------------------
#   Device Node       UUID                                   Status
-------------------------------------------------------------------------------
0   disk2s2           74871AA3-0AA0-48E6-AA1B-3D8C97BA3F17   Online
0   -none-            5CA5FEF9-9FB7-3846-88FB-2F1820C856D0   Missing/Damaged
==================================================================

You can use this command to get the results of the first status match (note that the word status is used twice in the report).  This response should be either Online or Degraded:

diskutil checkraid | grep -m 1 Status | awk '{print $2}'

This is the diskutil command, checkraid verb, pipped to grep to match (-m) the first hit (1) the line containing the word Status.  This is then pipped to awk to print out the item in the second column of the result.

With that command, you can create a script to check the result and then do something.  For example, a framework script would be:

#!/bin/bash

status=`diskutil checkraid | grep -m 1 Status | awk '{print $2}'`

if [ $status != "Online" ]; then

## do something here to alert you like send an email

fi

This will catch anything other than an Online reply.  Once you have the script setup, use cron or launchd to schedule it.  I will have this run once a night on servers I control.  In environments where possible, I will send an email to my alerts system, informing us that there is a problem.

My pleasure.  Good luck with your deployment.  I am going to add this information to the book and will credit you in the Change Log for bringing up this point of clarification.

If you have a chance, drop a review into the iBooks Store.  Would love to see more comments :)

urban420

  • Newbie
  • *
  • Posts: 11
    • View Profile
Re: RAID
« Reply #4 on: February 02, 2014, 01:27:44 PM »
Awesome stuff. Your Swiss army USB hard drive is a great idea.

The help is much appreciated and the info on the hard drive monitoring will come in handy.

Thanks again. 


urban420

  • Newbie
  • *
  • Posts: 11
    • View Profile
Re: RAID
« Reply #5 on: February 11, 2014, 08:16:57 PM »
I just wanted throw another question over your way. I've finally gotten around to getting our new server prepared and was trying to set up RAID mirroring on the drives. I know there are a few different ways to go about it, but your directions were pretty short so I tried that route.

The strangest thing happened when i went to restore the DMG image. Once I hit restore the system acted like it was going to start and then stopped and displayed an error with the following info:

"Could not restore. Operation not supported."

Very strange because once I removed the RAID and put the drives back the way they were I was able to use the image to restore to one of the drives without any problems.

Just wondering if you have ever come across this.

Thanks -

J

Reid Bundonis

  • Administrator
  • Full Member
  • *****
  • Posts: 107
    • View Profile
Re: RAID
« Reply #6 on: February 12, 2014, 08:27:09 AM »
I have been lucky enough so far not to run into that issue.  (the joy of fighting with Fusion drives that self destruct has been my latest high frequency activity).

If you have a working system, you might want to repeat the process.  I will admit that it is odd that the restore worked on a regular drive but not when the RAID was constructed.

I will also note that we recently had an uptick in failed drives in mini servers.  Between Dec and Jan our metrics for systems that we manage shows 9 warranty replacements of drives in mini servers.  Checking on the rest of 2013 (excluding Dec) that was 2x as many as we did all year.

If you are still trying to get the unit into a RAIDed state, I would suggest starting over and trying one more time.  The method published in the book is our standard process and we do this at least 3x a week.

If you are still having issues, reply back.

anog

  • Newbie
  • *
  • Posts: 6
    • View Profile
Re: RAID
« Reply #7 on: March 01, 2014, 09:23:40 PM »
I have been lucky enough so far not to run into that issue.  (the joy of fighting with Fusion drives that self destruct has been my latest high frequency activity).

I was going to ask that. Disk Utility begged me to make a Fusion drive so I did. :-) My thought was if it mainly was using the SSD then wear on the hard drive would be minimal. So far I am storing less than just the size of the SSD. So far its been working O.K. Are they that bad?

I have had very BAD luck with software RAID. I get corruption from the RAID screwing up much more than I have from disk failure, but that is just me. I do have a standalone RAID enclosure that has been fantastic. I use that to store my shares and back it up in Time Machine.

 

Reid Bundonis

  • Administrator
  • Full Member
  • *****
  • Posts: 107
    • View Profile
Re: RAID
« Reply #8 on: March 02, 2014, 11:07:14 AM »
The fusion drives are an interesting use of what appears to be ZFS technology without explicitly calling it ZFS.  Apple's own man page take about pools (see man diskutil and read the section on coreStorage).  Basically, Apple is taking two drives, an SSD and a standard spinning disk, and combining them into a single volume.  Then Apple is doing some background magic to detect which files you use the most and the OS will keep these on the SSD drive.  If it detects stuff that you never touch, it will shuffle it off to the "slower" storage.

In most cases, this works rather well and gives users SSD speed with the capacity of traditional drives (at a cost that is reasonable).  But, I have seen products like QuickBooks (not current shipping version) throw crazy errors like, "I can't save this document because the file name is greater than 31 characters."  Wow, blast from the Classic OS days.  But what took the cake was that Adobe Illustrator CS6 will occasionally reveal the same error!  Usually trashing the preferences will resolve it.  But, it has only appeared on Fusion drive machines.

Next, when imaging systems for mass deployment, Fusion drive machines have given us some fits.  I would estimate at least 10 machines in the last month simple choked on their own format and basically would not boot.  One iMac we had to target disk mode into a 10.6 machine to force a format of the individual drives.  Then when doing the Internet recovery, the installer repaired the fusion drive.

I've seen a lot of people report that the Apple software RAIDs have given them fits.  Maybe we are the lucky ones.  I've been using Apple's software RAID (and hardware RAID like megaraid and raidutil) on all sorts of hardware and OS versions.  I will say that the one time that it does not work is when you are using non-matching drives or third party drives that don't quite meet the Apple level of expectations.  The minute you mix and match, even if the specs on the box say the same thing... trouble is afoot.