R.G. Keen's Blog

Compact Flash Boot Disks

Posted in Uncategorized by rgkeen on April 30, 2010

I’ve always intended to get to shrink the boot and OS part of my server to as small as is practical. I bought an IDE to Compact Flash adapter to try this out. This was a Crest I/O branded Syba SY-IDE2CF-DU adapter. (US$13, newegg.com).
Syba SY-IDE2CF-DU adapter Transcend 2GB compact=

It was intended to plug into the IDE port on a motherboard, and present two compact flash cards as IDE disks. I picked up two Transcend 2GB CF cards, intending to experiment with an EON server setup. This should boot handily within 2GB, and probably much less.

Feeling cocky from my main server just working, I unwrapped the adapter card, poppled in the CF cards, and … found I didn’t have a machine with a spare IDE port to plug it into. All my recent motherboards have only one, and I have the boot disk on that one. Some creative descriptive language and about 45 minutes later I had a non-essential machine reconfigured with one SATA boot disk and the dual compact flash card inserted into the IDE port on the motherboard.

It not only would not boot, it didn’t recognize any of the disks. OK, read the manual.

Try one CF card at at time. The BIOS then saw only the card in the slave side of the adapter. Ahah! Bad CF card; swapping the CF cards produced – yep, the same thing. Both cards were recognized when in the slave CF slot, and both refused to be recognized by the OS for any further work.

Must be BIOS. I duly squirrel around in the bias for an hour. No breaks. Hmmm. Jumpers on the cards! One is a Master/Slave swap jumper, the other is a voltage selector, 3.3V to 5V. Both the adapter “manual” (1.5 sides of a single folded sheet) and the adapter note that it does not care whether it gets 3.3 or 5V, works both ways. The CF cards likeways insist they don’t care.

OK, gotta be that Master/Slave jumper. Again I go through all combinations of jumper and CF cards, and find that now only the master slot works.

This goes on for hours until I’m sure that the card is bad, so I spend two more hours on google looking for what I missed. I don’t find it. Two hours after I sent the nasty note off to the manufacturer, I decided that it won’t hurt to move the power supply jumper from the default 5V position to the 3.3V position. What the heck.

Except the jumper, right out of the package sits in the 3.3V position. But that doesn’t matter, both the card and the CF cards don’t care, it says so right here.

But they do care. Or maybe the motherboard cares. Move the jumper to 5V, and now it behaves perfectly.  The adapter plugs into the motherboard IDE cable connector and gives me two 2gb disks, one master and one slave, both of which work perfectly.

I don’t know that there’s a moral to this, other than learning that

(a) IDE to Compact Flast adapters *can* work.

(b) They’re tricky and sneaky.

But the Syba SY-IDE2CF-DU and Transcend 2GB 133x compact flash cards do work – as long as you set the voltage jumper to 5V.

I want my hours back.

[Note: as of 5-23-2011 I have decided to ditch OpenSolaris entirely based on Oracle’s new directions for Solaris and OpenSolaris. My data is too important to depend on what I perceive from Oracle. In terms of this post, both FreeBSD and FreeNAS installed and ran from the same compact flash hardware I was using for OpenSolaris.]


Powering Your Server

Posted in Uncategorized by rgkeen on April 29, 2010

Power is an issue for all non-trivial servers. It’s also tied up with location and remote management.

An ideal server would use zero power and be totally silent. If that was possible, remote management would be a much smaller issue, because there would be less need to put it somewhere else, but not where it will annoy people.

It’s a truism that all the electricity that goes into your box eventually becomes heat that has to get out of your box. That’s why machines have fans. The parts inside heat the air inside, and the fans move the hot air out and fresh, cool air in. The fans also make noise, annoy people, and being mechanical, fail, which lets your box overheat and maybe die.

So there is a premium to be placed on not generating the heat to start with so fans are smaller, quieter, and perhaps unnecessary. Some great links for background reading on this issue is to google “quiet PC” or “silent PC” and “home theater PC”, which address the issues of not generating heat, dealing with the heat by fans, and dealing with the noise fans generate.

Electrically generated heat is also a money issue. In the engineering sense, energy is money. Electricity at my house costs US$0.12 per kilowatt-hour. A 100W PC running for ten hours costs one kW-Hr, or $0.12. Left on continuously for a year, that cost becomes 365*24 = 8760 hours, and that costs 876 kW-hours, or US$105.00.

If I could design a server which used less power, say 50W instead, then the cost per year for electricity would be US$52.50, and I would be economically justified in paying $52.50 more for the parts.  And the second year, I’d be US$52.50 ahead by doing this.

Not only that, the 50W version might be cool-able with a smaller, quieter fan, and therefore less annoying to be near.

There is a whole subculture of enthusiasts for low-power, low noise computers. Often this niche can be found by searching for mini-ITX form factor PC motherboards. Some of these can be entirely cooled by natural convection, which is a way of saying “no fan needed”.

I’ll expand on this later, but some useful things to consider are:

Watch the “TDP” power rating of your processor. You can get CPUs with much lower power. These are lower performance, but high speed CPUs are NOT needed for most home servers.

Watch the power rating of your disk drives. In general, desktop 3.5″ drives are 6W, and 2.5″ laptop drives are 1W – 2W; however, desktop drives have about 2:1 bigger storage than 2.5″ drives, and cost less per GB. The tradeoff gets complex, and so you need to know how much storage you need, versus what power it takes to provide that many bits, and how much you’re willing to spend on electricity, and how much noise you will tolerate.

Remote Power Management, Part Three: Button Pusher Circuits

Posted in Opensolaris Server, Power Considerations, Remote Management by rgkeen on April 28, 2010

I’ve designed a couple of circuits which implement the button pusher circuits for remotely pressing the power-on and/or reset buttons on an ATX style machine from a LAN controlled AC power outlet.

The circuits are trivially easy to make if you’re an electronics hardware type. I’m doing PCB layouts of both circuits for myself and can make those available for you to use if there’s any interest.

Let’s do the warnings up front. These circuits necessarily involve connections to the AC power line because that’s how they work – they convert the AC power line coming on to an isolated contact inside a controlled machine. There’s no way around that. The circuits are easy but any time you even get physical access to AC power line wiring, there is danger of electrocution, electrical damage to property and starting fires. Living in an electrically-powered world like we do the rule for everyone has to be  that if you don’t know how to do it safely, don’t start. So read that safety warning in the center of the picture and follow it. It’s YOUR responsibility.

I apologize for the legalism, but in a lawsuit-mad society, I cannot be personally responsible for your actions. You have been warned. Act accordingly.

With that bit of nastiness out of the way, let’s get on with doing what we started out to do.

Circuit 1 is the no-transformer version. The amount of electrical power needed in faking a button activation is tiny. So the first version uses a capacitor (C1) to limit the amount of current the AC power line provides to the rest of the circuit, and it does it in a way that is very power-efficient. The rest of the circuit takes this little trickle of current, makes just enough DC voltage out of it to run the integrated circuit. The integrated circuit is a three pin device that does only one thing – it senses whether its power supply is big enough, and when it is, it pulls its output pin high.  High in this case is about 5Vdc, and that’s enough for the resistor R3 to allow enough current through the LED of the optoisolator ISO1.

This causes the isolated phototransistor on the isolator’s output to be fully saturated and look like closed contacts to the ATX power supply. If you connect the isolators output across the “power switch” contacts in the contacts of the motherboard you’re using, then it will look just like you’ve pressed the front panel switch to the motherboard, and this will activate the power supply on/off just like you pressing the button.

Two remote button pusher circuits

Button Pusher #2 does exactly the same thing on the computer side. What’s different is the way that the power supply for the output is generated. Instead of using resistors and capacitors to make the voltage needed by the reset IC, it uses a transformer to both isolate the AC power line and to change the power line voltage down to the few volts needed for the reset IC. This voltage is rectified, filtered, and then regulated by a tiny three terminal regulator. This circuit is bigger and heavier than Button Pusher #1, but not much. I was surprised to find that there are now tiny transformers which will do this job, and that they are not expensive.

I’m not going to clutter this posting up with parts lists and so on, because I suspect that very few people will ever actually build one of these. If you want to really build one of these, instead of just reading about it, leave a comment so we can get in touch one on one.

Remote Power Management, Part Two: A Button Pusher

Posted in Power Considerations, Remote Management by rgkeen on April 27, 2010

Having found a LAN-based controller that can turn on/off up to four AC power outlets, my quest for remote control resolved into needing a way to have an AC power line on/off be the equivalent of pressing the “reset” and “power on/off” buttons on an ATX computer.  Remember, the issue to be avoided is using only an AC power line for power off/on.  That’s necessary in some cases, but an extreme measure, and to be avoided if possible.

Here are a few ways to accomplish the task that came to mind:

  • Power a solenoid from a switched outlet, and have the solenoid plunger actually press the buttons. I discarded this as too clumsy and requiring too much mechanical fabrication. Likewise, other electro-mechanical approaches fall into the same category.
  • Power a relay with a 120Vac coil from the switched outlet, and connect the relay contacts in parallel with the front-panel switches. The power line outlet coming on causes the relay contacts to act like a front panel switch without being one. This one has some merit, and is conceptually easier to handle for non-electronically adept people.  It amounts to using a pre-manufactured solenoid plus extra switch contacts to do the first approach. I discarded it only when I thought about possible contact bounce and relay life issues.
  • Skip the fancy stuff and just use WOL. Nice enough, but my motherboard does not do this. A Whack-On-LAN card would work, probably. But I didn’t want to spend the time getting a possibly flakey card that is not otherwise used in the box to work reliably.
  • Power a small electronic circuit from the switched outlet, and have this circuit cause an electronic closure in parallel with the actual front-panel switches. Obviously this is inspired by the Whack-On-LAN device.  This is what I finally chose to do.

Having decided to design an AC-line to switch closure circuit, I sketched out my objectives. The final circuit has to short the contacts when the power line is on, release it when the power line is off. It has to do this cleanly, no funny wandering around between open and closed, and no bouncing open/closed/open/closed to confuse the power supply. Oh, yeah – it has to not electrocute me, burn down the house, or destroy the computer it’s controlling.

Update Notice:
The following is the kind of thinking that would go into designing a right-down-to-the-bare-metal device. I had a belated brainstorm that makes this kind of moot, as well as safer. If you’re going to do something like this read Part Three: A Safer Button Pusher first. Using a pre-existing wall wart relieves many safety concerns, as well as being quicker to actually get. Most Goodwill and used-computer stores have bins of wall warts at rock bottom prices.

This last is where all the complication is buried in this approach. Building a circuit which goes from the AC power line into a metal box that people can touch is a clearly dangerous situation. But there are rules for how to do this. Safety testing agencies have been specifying this for decades.  There are three Big Rules. They are:

  1. Provide physical spacing and testable 2500Vac to 5000Vac isolation between the AC line and metal that a person can touch.
  2. Make the circuit not start fires if any part fails in the worst possible way.
  3. Don’t generate any isolated voltage that is dangerous to people or property like the AC power line is.

We’re in luck on the first one. The electronics industry makes optical isolators which have an internal multi-kV isolation. They provide an output which is usable for the remote-shorted-switch operation, and are safety lab tested for the Big Rules. And they’re cheap. A suitable device is less than US$1.00 in almost all cases. I found one for US$0.30.  This is too good not to use.

The second task is to turn the isolator on and off. The input side of the isolators are universally an LED (Light Emitting Diode), which needs to be driven with about 5-10ma of current to make the output go on/shorted, and no current to make it turn off. I pondered designing a transistor circuit to do this, then an IC circuit (National Semi LM10), and finally found an IC which does the whole thing cleanly. This is a three pin IC which generates a clean, snap action on/off voltage in response to its power supply. The Microchip Technology TC54 comes in a TO-92 package and costs US$0.50. Perfect.

What remains is to make some kind of circuit to power up the TC54 when the AC power outlet goes on, and de-power it when the socket goes off.  The complication is that the socket provides 120Vac, and the TC54 wants a voltage in the range of 1 to 10 Vdc. I found this article by Robert Kollman and Brian King in back issues of EDN. It does the necessary job perfectly, and uses only light, small electronic components.  I’ve adapted it for the button pusher.

It is also possible to use a step-down transformer to make an isolated DC supply to power the TC54. In fact, in many ways this is a simpler approach, as the transformer itself encapsulates all the safety isolation considerations.

I’ll post both methods in an upcoming addition to this entry.

Remote Power Management for Home Servers

Posted in Opensolaris Server, Power Considerations, Remote Management by rgkeen on April 26, 2010

Shortly after I got my first Opensolaris server up and running, I discovered that I never wanted to see it again – servers should be usable but invisible as well as silent!

Back to google. A long series of searches led me to some conclusions and solutions.

  1. LAN managed power cycling is a Very Good Thing if you are not able to walk over to your server to restart/reboot it. It will be needed. My neighbor makes his living doing system admin for a commercial data center with hundreds of systems in racks. He has to drive in to work in the middle of the night often enough even with industrial-strength remote management setups as it is.
  2. Hard power-off is potentially life-threatening to your data. Soft power off is much kinder. Hard power off should be reserved for situations where the system is no longer listening to your instructions and there is no option. Again, my neighbor who lives and breathes remote support confirms that he’ll do pretty much anything to avoid hard power-off on a critical server. His opinion is that hard power-off forces you to consider server CPR as a real possibility.
  3. LAN managed power cycling is moderately expensive. And I’m – well, I prefer the term “economically prudent” to cheap.

I can hear you wondering why I didn’t just set up Wake On LAN (WOL). That would actually be a great thing to do except for two issues.  First, not all motherboards support WOL. My first motherboard, the M3A78-CM which is otherwise great, does not. Worse yet, it does not clearly tell you that in the manual. I had to send a tech support request to the maker to find that out unambiguously.

Then again, WOL is for when things are all working OK. I can always tell my machine to shut down if it’s listening to me. But if it’s too busy contemplating its own navel and re-counting its toes to be bothered to listen, there ought to be some way to make it listen. I think I found it.

A short series of conclusions came up in the searches.

There is a sweet spot for LAN managed AC power. It’s the IP Power 9258 series. This is a small box with an ethernet port, four managed AC outlets, and an internal web server to flip the managed outlets’ states. Here’s a copy of the manual.

The current manufacturing state of the unit seems to be in question. A google search turns up many places which say out of stock and no longer available. However, there are also many places which have them, including a seller on ebay for $80. It’s in the mail to me as I write this, so we’ll know soon.

I found another technique called “Whack-on-LAN”, which uses a modified ethernet NIC which has a WOL connector to do what amounts to  pressing the reset button over the LAN. The article describing this is here. In essence, you convert a cheap LAN card to a remote-controlled button pusher and hit the reset button. The only issues with this trick is that the LAN card lives inside the server and is dependent on the server power supply and BIOS/operating system making it happy so it can do the LAN remote control trick.

So there’s some interesting stuff. In my mind, it plays out like this:

  • I want to be able to tell a non-WOL system to fire up. A LAN-controlled button pusher would be great for that.
  • Any system that’s working properly will shut down on its own.
  • Any system that’s not working properly would have to be hard-reset.
  • If the hard-reset doesn’t work, it needs hard power-off cycled as a last resort.
  • There are advantages to an integrated power management system even for a board that supports WOL.

Here’s the solution I came up with. Use the 9258 to control AC power. I really only need one of the controlled AC outlets to run the system. That leaves me three other outlets which can be viewed as a form of LAN controlled logic outputs – that is, AC power = logical true, no AC power = logical false. What is needed to get all of what I need for deluxe remote power management for my server is to make those AC power logic levels into isolated contact closures, which then can interface directly into the normal circuitry of the industry-standard ATX form factor personal computer stackup.

Did I mention that I spent many years doing analog and digital circuit design? The two obvious answers are either to use an AC-activated relay with a power-line rated primary run from a controlled outlet, or an electronic circuit which senses AC power and does an isolated “contact closure” similar to the transistor switch use in “Whack-on-LAN”. There are many optoisolators available with 4kV isolation from the LED driver on the primary side to a transistor output on the secondary side which will safely do this job.

I’ve done a quick design of both of these approaches, and I’ll document those here in a future post.

ECC for a Home Server?

Posted in Data Integrity, Opensolaris Server by rgkeen on April 25, 2010

One major decision in building a home server is whether you will use ECC (Error Checking and Correcting) RAM or not.

To save myself some typing I suggest you look at Wikipedia’s entry on ECC memory for some background.

Your main memory is dynamic RAM chips, and they do get soft errors. The error rate may vary between tragic (one error per GB per month) to trivial (one error per GB per century). If you’re lucky and get one of the 1/century setups, great. With my luck, I normally get stuff from the other end of the distribution. So I demand both that my server have ECC and actually use it productively, which are two different things. I suggest a look at Robin Harris’ blog on memory errors for some more background. Errors do happen. They are sometimes sly and unnoticeable until they have killed something.

I split servers into three classes in my mind: trivial servers, moderate servers, and major servers.  Trivial servers are a handy bit bucket, a place to store some stuff. No significant attention is paid to data integrity or future expansion. Moderate servers differ from major servers in that they have features for data integrity and future expansion, but are very parsimonious about how much money is spent on the whole setup to get these advantages. Major servers are a data-center creature, with planning, support staff, and so on. Personal home servers are usually moderate servers.

In my mind, I think that trivial servers do not have ECC, moderate servers do, and major servers must.

This is a lurking Big Deal in server decisions because ECC is not that available in over-the-counter motherboards. The X86 motherboard market has split into two major lines: desktops and servers. Desktops are what we normally think of as a motherboard when we plan a new build: pick a processor, pick a CPU, grab some memory, and bango, new system. Not all that long ago, the normal run of the mill motherboard chipset could and might support ECC. The difference is, unsurprisingly, money. Desktop motherboards are a very, very cost sensitive market, and a penny saved is a sale made. Server motherboard buyers are both informed enough and motivated enough to spend a bit more for data reliability.

Intel-provided chipsets have done the same split as the motherboard market. Their latest family of desktop motherboard chipsets for the i5 and i7 families definitely do not support ECC. If you want ECC and demand Intel CPUs and chipsets, you’re going to have to click that “server motherboards” link and pay more for them. According to my frantic googling, the premium runs from about US$200 if you are very, very thorough to as much as $2K.

AMD’s latest desktop processors and chipsets do support ECC, and this leads me to one of my choices – use AMD CPUs and find a motherboard that supports ECC. The added complication is that even here, penny-pinching has struck, and many motherboards which have AMD CPUs and chipsets which nominally support ECC have removed either the BIOS support for ECC or the actual PCB traces that let it work, or both.

This lets them tell you that the motherboard is compatible with ECC memory without actually giving you ECC. I refer to this condition as ECC-memory tolerant, not enabled.

The key seems to be looking in the motherboard manual for a way to enable/disable/set up ECC in the BIOS memory setup section.  Note that I consider that if the motherboard vendors will not let you download the user manual before you buy it, you should not buy that motherboard. Full stop.

So you need to decide: are you building a trivial server or a moderate server?

A trivial server is a perfectly valid thing to build, as long as you have looked at its limitations and are OK with them. It is perfectly reasonable to want, say, a very-low-power server for minimal cost with limited data storage as long as you know and accept that you will be getting soft memory errors and soft disk errors over the long haul.

I want a home server that gives me more data integrity. I am willing to spend some more money to get that, but would like to keep the premium down to as low as possible. In my personal evaluation system, that means I need ECC memory. That drives the decisions: AMD low-end CPUs and chipsets support ECC more inexpensively than the semi-equivalent Intel ones. And within the AMD motherboard space, some motherboards support explicit BIOS options for ECC enablement.

As a simplifying matter, all ASUS AM2, AM2+, and AM3 motherboards seem to have ECC support in BIOS, as well as statements about support for ECC and non-ECC DRAM. Only some Gigabyte motherboards with the same chip set may support ECC; they make statements like “ECC is supported only for CPUs which support ECC”, which kind of implies but does not make clear about their support. There are a few other boards, notably a Biostar one or two with AMD chipsets that do, I think. Out of this space, if you’re going with Opensolaris, you need to check the Opensolaris forums and Hardware Compatibility List (HCL) and pick one which clearly has support.

I picked an ASUS M3A78-CM (appx US$80 in Jan 2010) on this basis. My server motherboard came up and installed first time, no errors, and ran Opensolaris. This motherboard has been discontinued – probably why I got a great deal on it! – but the newer M4A785 boards are a very reasonable alternative.

I think ECC is mandatory for one of my primary objectives – data integrity.
[Note: as of 5-23-2011 I have decided to ditch OpenSolaris entirely based on Oracle’s new directions for Solaris and OpenSolaris. My data is too important to depend on what I perceive from Oracle. I feel like I personally would be better off trusting my data to a system less susceptible to manipulation.]

Storage Needs in a Home Server

Posted in Opensolaris Server by rgkeen on April 25, 2010

You need to decide ahead of time how much storage you will reasonably need. In a non-trivial server, the amount of  storage to be included is a major consideration, as well as the major up-front expense. In a trivial server, the storage is secondary to the rest of the system.

You will need to include all the storage space you reasonably think you’ll need, plus some expansion room, plus additional storage for the redundancy that will make your storage reliable.

Redundant storage is a price of data reliability. Redundant storage in the form of mirrored disks, checksum-containing disks, or spare disks, or all of these, is what gives your server the ability to recover from failures.

Unless you have a very small server objective, it is well to consider the amount your server storage can be expanded without a major and possibly expensive rebuild.

I realize in rereading this that it sounds too simplistic for many people to take seriously. However, remember that people are data packrats. The largest amount of storage that you can imagine using will be overwhelmed in a few years.

I recall an illustration from the pre-personal-computer days when centrally-managed data centers were all the computing that was available. A major university was dealing with storage issues on its central computers, and this was in a time when one megabyte of main memory cost US$1M and the US dollar was worth about five times what it is today (2010). The system admins installed an access counter that incremented a counter for each file in the system every time it was accessed. The average number of accesses per file was about 1.1, and that included the access which created the file.  And that included the system files, compilers, utilities, that every user accessed all the time.  The user body was, in effect, using the storage pretty much as write-only-memory, storing stuff they never went back to.

Trust me – you need to think about expansion of your storage system.

Bit Rot

Posted in Data Integrity, Opensolaris Server by rgkeen on April 24, 2010

Digital logic is by design, remarkably unforgiving of errors.  One important foundation of our data technology as it exists now is unambiguity. A bit is either a one or a zero, either true or false, and anything in the middle is nothing at all. Doesn’t exist, at least to the logic.  Feed an intermediate value to a logic gate and you could get anything at all as an output.

We depend on this. When we set a bit, we expect it to stay the way we set it.  This is remarkably difficult to do in the real world. And the real world has a way of degrading things from what they ought to be into complete uselessness.

These two facts mean that, as an illustration, if you flip the wrong bit in the boot sequence, your machine will not boot. Or a program will hang or crash. Or, far worse, simply give you slightly and undetectably wrong answers.  So immutability is a virtue in the storage of information. About the best humans have done so far are two of our early experiments: clay tablets and stone carvings. We have some reliably-readable (if we only know how!) records from thousands of years ago in these media. Today, it is unusual for an “archival” medium to be readable in ten years, no matter what the medium. The machines to read the data may last a shorter time than the medium it’s written on. When was the last time you read a floppy disk? No, not the hard-case 3.25″ ones, one of the truly floppy 5.25″ or 8″ ones? If you had one could you read it?

All stored data degrades. Full stop. The key to keeping your data uncorrupted is redundancy. Those clay tablets? Redundant by the physically durable arrangement of zillions of molecules and the pattern recognition of a human eye-brain reader. But the bandwidth and storage capacity is low.

Data on hard disks degrades like all magnetic recording. Eventually every single magnetic storage medium will be unreadable.  Much of the circuitry on a modern disk drive is there to deal with this. They use error detecting and correcting codes on the data, copy-on-write schemes on the disk, and spare sectors to map out early-failing sections of the disk. Your main RAM memory? Alpha particles and cosmic rays can flip random bits – and do so, with some regularity. Magnetic tape? See “hard disks” plus the tape/binder/magnetic coating is exposed to the elements. Many tapes from the early parts of the computer era in the 50s, 60s, and 70s are now unreadable even with the correct devices because they’ve magnetically relaxed and the magnetic coating is coming loose from the plastic tape.

Hah! Writable CDs and DVDs! Nope. They have a known, predictable rate of bit rot. I personally like DVDisaster,  an open source utility that adds a layer of error detection and correction to CD and DVD data so you have a chance to re-create the corrected data if there aren’t too many errors on your media.

Bits always rot. You have to have at least one extra copy of the data or enough redundant information to detect and correct errors, or the first bit rotting may completely lose your data.

This makes the following important issues to me in my home data server:

  • error detection and correction in the disks; this pushed me into zfs
  • active data examination and correction – “disk scrubbing” – which is another native facet of zfs
  • ECC RAM in the server; this is a feature of all serious server machines, but is quite rare in desktop machines.  There is a big benefit to cost ratio for my purposes for finding and using hardware that allows ECC RAM. Put another way, this consideration limits my choice of processors, chipsets, motheboards and RAM designs.
  • disk fault tolerance; all disks will eventually fail. The trick is to withstand a failure by noticing it happened, and correcting for it before your data is completely lost. This is something that RAID promises, but the mirroring and raidz forms of zfs deliver.

I have some pictures of my grandmother when she was in her 20’s. This was back in the 1880s. These are preserved in the form of silver oxide on a celluloid back plate. Today we – and you probably – take pictures digitally.  Will your great grandchildren be able to see your pictures? Will your children even be able to see them in a decade?

Bit rot is something worth worrying about.

A Question of Objectives

Posted in Opensolaris Server by rgkeen on April 24, 2010

I have read somewhere that if necessity is the mother of invention, laziness is the father.  So it is with my server build. I’ve been involved with computers since before “personal” got attached to them, so I know full well the value of backups.

It’s not that big a deal when you have one machine to back up. But I use four desktop machines in my office on a KVM switch and two laptops. That doesn’t count my spouse’s machine. Laziness started nagging at me when I was moving the USB drive from machine to machine to do backups. I have a house network. All the bits could easily go through that, and possibly automatically. I was familiar with the NAS concept from my last job, so

NAS is where it started

When I googled NAS, I found that it had moved downstream to the PC world. GREAT! All I had to do was buy one and move on. If it were only so easy.

I did the necessary hours on line and found that packaged NAS boxes were relatively expensive for the amount of storage, and the low end NAS packages were, well, low end. Low function. What they had going for them was that they were easy.  But the curse of any deep involvement with data is knowing how fragile data is. When I looked, I found that the NAS appliances had only a casual link to fault tolerance, and that led me to RAID arrays, which overlapped with NAS as the various NAS setups used RAID arrays.

Being RAIDed

And so I went off to learn about RAID arrays, another thing I’m conceptually familiar with but with which I had no deep involvement. I do have a friend who make a very well paid living visiting clients and helping them try to recover some of their data from failed RAID systems, so I got cautious about this. RAID turns out to be a great concept that is less simple to do well than to think up in the first place.  Put another way, if you’re not very careful, RAID is a good way to lose most or all of your data in one fell swoop. You have to learn enough to push the correct buttons.

And that gets me to ZFS, and that gets me to Opensolaris

In reading about RAIDs I ran into raidz in zfs, the primary issue being that zfs is designed to do some things that other file systems are not designed to do. Two of those are dead center on my issue of data reliability in the face of disk failures and flakey hardware: disk failure tolerance and soft-error scrubbing. Those issues are native to zfs. Other file systems are busy adding them on at the moment.

My assumption was that Opensolaris, the open-ish source version of Solaris, would be a more current and frequently updated version of zfs.  But there are other places. FreeBSD has incorporated zfs, and has for a while. You can also run a variant of zfs on Linux, but because of the differences in “open source” definitions between Linux and Opensolaris, this cannot be integrated in the kernel and must run in user space, that making it in my view more error prone and slower.

So – I decided to build and did build an Opensolaris server to get to zfs. Along the way, I found that Opensolaris includes in the basic setup everything I needed to satisfy my needs for a server.

A question of objectives

With all that rambling out of the way, I’m finally down to the meat of the post: objectives.

I wanted, in order:

  1. data integrity; no amount of sloth is justified if you can easily lose your data
  2. network attached storage to soothe the laziness motivation
  3. closet/garage/attic style installation so I could simply think of this as my own personal cloud
  4. as inexpensive as I could do with reasonable attention to the preceding three items
  5. the other goodies of a file server: expandability, remote management

There are some other things that I passed along the way that you may have as an objective for your server. It is quite important that before you start writing out checks you think about your objectives in a server. As Lewis Carroll noted “If you don’t know where you are going, any road will get you there.” But some roads are more expensive than others. Some objectives you may have include:

  • high performance; you may need some level of speed over your network for some or all file transfers. If you’re serving video, this can be especially true. Or if you regularly move big data chunks around. Or you may just like speed. The need for speed can be quite expensive in both money and time.
  • enormous data storage; zfs/opensolaris will do this easily enough. But be realistic about what you’re willing to pay for storing bits.

There are others. The important point is to list your needs, by priority, so you can make decisions on what needs to go in the server. I found that going back to my objectives was a huge help in holding off creeping featurism, the bane of all systems design.

[Note: as of 5-23-2011 I have decided to ditch OpenSolaris entirely based on Oracle’s new directions for Solaris and OpenSolaris. My data is too important to depend on what I perceive from Oracle. In terms of this post, the ZFS support on BSD is not as advanced as that in Solaris, perhaps, but I think I can depend on it more. The decision to go to ZFS was correct. But given what has happened with Solaris and OpenSolaris, I feel like I personally would be better off trusting my data to a system less susceptible to manipulation.]

OK, OK, I’ll do a blog!

Posted in Uncategorized by rgkeen on April 24, 2010

In my quest to build a home NAS, I wound up building an Opensolaris based server that will do a lot more than NAS. Along the way, some of the people I pestered for information and help came back and pestered me to blog about my experiences.

So here you are. Enjoy.
[Note: as of 5-23-2011 I have decided to ditch OpenSolaris entirely based on Oracle’s new directions for Solaris and OpenSolaris. My data is too important to depend on what I perceive from Oracle.]