On Thu, 31 Aug 2006, John D Groenveld wrote:
> In message <***@mail.gmail.com>, "Tony
> Reeves" writes:
> >This is an age old problem with memory caching RAID systems, if the
> >memory corrupts data or the memory power fails before writeback then
> >you are screwed. There is a comprimise between speed and data
> >integrity, frequently not recognised because the RAID systems are
> >inherently quite reliable. Most times when memory backed RAID reports
> How much faster is RAID5 on the Areca with its big fast volatile cache
> (that may or may not get flushed to disk) and RAIDZ on a dumb JBOD
> connected to a SATA/SAS HBA?
I have no experience with the Areca - but it looks like a great product at
a price/performance level that is shaking up the marketplace. Here's a
review to get some performance numbers:
Re ZFS performance - it's a mixed bag right now (Update 2), depending on
how you're using it. Let me explain. There are some things that go
incredibly fast - you see zfs issuing over 1k IO Ops/Sec (IOPS) and you
can't believe the operation has already completed before you even get a
chance to evaluate how its performing. And there are times when zfs won't
issue more than 200 to 300 IOPS and you're left scratching your head and
wondering why. And its behavior defies any attempt to predict how it'll
handle different usage scenarios.
That being said, a lot of silly bugs that escaped the initial release of
ZFS are already fixed and will be released in Update 3. And there were
some oversights in the code where the developers went duuhhhh .... and
fixed them pretty quickly. And there are a bunch of changes that will
improve performance quite a bit.
But none of what I write here should discourage anyone from grabbing an
8-channel SATA card and a bunch of SATA drives and actually using zfs.
The admin model and usability will blow you away. You can create a > 1Tb
pool in under 10 Seconds and copy CDROM sized images (to a 5 disk raidz
pool) in under 3 Seconds. You have to change your thinking and start
creating filesystems where you would normally create directory entries.
And then you have the 3 most important features of zfs: snapshots,
snapshots and snapshots. :)
Also, looking forward, zfs is being integrated with Zones to create
features/facilities that will fundamentally change the way most
progressive users deploy Solaris systems. The concept is to create a
snapshot of a zone, and then simply clone that snapshot when you want to
create your next zone. In practical terms, the time it'll take to
create a "fat" zone will go from 12 to 20 minutes (depending on your
hardware) to probably (?? WAG) 60 Seconds.
You can already try this with Solaris Express or by building Opensolaris -
but its not supported in the commercial release of Solaris. In fact,
building zones on top of ZFS is currently unsupported because patching
will cause *major* breakage[2.5].
Re: dedicated RAID5 hardware versus ZFS. Obviously, buying a $100 SATA
controller versus a $700 H/W RAID5 controller will leave you with more $s
to buy inexpensive SATA disk drives. The fundamental difference, IMHO, is
that the H/W RAID controller is a one shot purchase that will have a fixed
useful life - before it begins to feel too slow as the attached storage
continues to increase in size. OTOH, ZFS is at rev 1.0 and will continue
to grow, in terms of performance, reliability, features/facilities etc
over time and will take advantage of faster CPUs, more CPU cores and
larger system main memory capacity. It will also be integrated with
Solaris in clever/ingenious ways in the future.
Concluding remarks. Personally I hate it when people only do "good" news
and don't report the downside. So.. Q: Where is ZFS currently weakest?
- IMHO the "variability" in performance that you experience in Update 2 is
troublesome - versus the predictable performance we've come to expect of a
UFS based filesystem. Try zfs with *your* application data in *your*
system environment first.
- ZFS does not "understand" how disk drives fail and what might be done to
work around the more common failure modes. I get the sense that team zfs
wants to examine the whole issue of how/what/why disk drives fail, and
then develope and implement a comprehensive strategy to deal with
failures. IOW, a clean-sheet-of-paper approach to disk drive
failures[3.5]. It does not help that failure data on SATA drives,
particularly the newer, monster drives on the market, is not widely
available ... yet.
- the usability of the new ACL scheme leaves a lot to be desired.
- zfs is not "known" by the popular commercial backup tools - altough
support is coming from some vendors. In contrast to this remark, there
was a recent discussion on the zfs-dicuss list on opensolaris.org that
details how you can make a killer incremental backup facility using
snapshots and rsync that will rival the facilities available only in
high $ commercial backup products.
- zfs puts incredible pressure on Solaris virtual memory and appears
exessively greedy with memory usage. It won't give up any memory until
the system reaches the low-memory watermark. To be completely fair to
zfs: zfs and dtrace have put incredible pressure on the current Solaris
virtual memory implementation and the complete fix for zfs may not arrive
until more work has been put into the virtual memory management code.
- zfs needs the large vitual memory address space offered by a 64-bit
architecture. IOW: it does not perform as well on a 32-bit system.
- there is a required mindset shift when working with zfs. People fail to
understand that a 12-way raidz system is not a good idea, or that
configuring zpools from partial disks is also not a good idea. And that
zfs is a rev 1.0 release that does have deficiencies and that lacks the
stability/performance/polish etc of a rev 5.0 and rev 6.0 or rev N
filesystem. Or that putting all you disk storage "eggs" into a zfs
revision 1.0 filesystem "basket" is proably not a good idea. I'm not sure
what, if anything, Sun can do to help educate the (potential) user
community. Its a difficult problem to solve and people tend to get
really, really pissed when something goes wrong with a filesystem.
- building on the last point: ZFS works best with large numbers of
inexpensive disk drives. But people have become so accustomed to getting
by with the minimum # of disk drives and carefully managing that disk
space, that they fail to deploy zfs based systems that make sense. When
thinking of zfs based systems, please think of *large* numbers of disk
drives. The fact that the Sun x4500 includes 48 disk drives if your first
clue! Think in terms of 3-way or 4-way mirrors. Think about a 4 or 5-way
raidz config with at least one spare drive. Think about a system
enclosure with between 4 and 10+ disk drive bays.
Recommendation: If you're serious about using ZFS, put together a test box
with between 5 and 10 SATA drives and gain experience with it *before* you
put it into production. If you're concerned with performance and can wait
a little longer, don't put it into production before Update 3 ships.
 Of course, this has *never* happened to _me_ with _my_ code! :) Yeah
 IOW - those who are *not* still running Solaris 8! :)
 This is already documented in the ZFS Administration Guide (Solaris
Express System Administators Collection). Why is docs.sun.com so bloody
[2.5] A really experienced Solaris admin can still figure out how to work
around these issues. A less experienced admin will probably loose several
zones and then discover that his/her system will not upgrade from Update 2
to Update 3 without major breakage.
 What do you expect from something that has had tens (possibly
hundreds) of man years of development/tuning work.
[3.5] Just like zfs is a clean sheet of paper approach to Unix filesystems.
 In Update 3, you'll be able to define spares, and associate them with
one or more pools. But ZFS will still fundamentally "see" a disk drive a
good or bad - without any "rescue mode" logic.
 Or a 4-way or 5-way raidz2 system - which uses 2 drives for parity.
> Will the Areca work without its ((dead?) battery backed) cache?
Don't know. But don't try this at home! :)
Al Hopper Logical Approach Inc, Plano, TX. ***@logical-approach.com
Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
OpenSolaris Governing Board (OGB) Member - Feb 2006
Please check the Links page before posting:
Post message: ***@yahoogroups.com