Future Storage System: Part 5: Backend Storage

by dave on October 22, 2008


n Parts 1-4 of the Future Storage Systems articles, we focused on the SAN-facing technologies that would enable scalable propcessing growth, purpose-built technologies for deduplication and encryption, as well as the fabric that would tie nodes together.  However, in each of these articles, I never got into WHERE  that information would eventually be stored.  Today, I’m hoping to remedy that problem.  I’ll be referencing the diagram below as usual.

FSS Backend Disk Layout Options

FSS Backend Disk Layout Options

There are a couple of basic things to observe about this layout.  First, the topology is decidedly generic compared to the archtypical backend bus architecture that most storage systems use.  Again, not necessarily trying to be vague here but I’m assuming that I’ll want to dive in deeper on that technology in a “Part 6b.” 🙂  Secondly, you’ll notice the presense of “fibre channel” with a question mark.  Fibre, as we know it, really has reached the peak of its usage as a drive technology.  As a fabric, interconnect technologies such as FCoE are still using Fibre Channel as a encapsulation protocol and I don’t see that changing any time soon.  Without getting too deep into that conversation, let’s take a look at what the FSS will support.

Backend Connectivity: The Backbone of Information Management

The first item you should notice on the FSS diagram above is the connectivity between the I/O Complex (aka “nodes”) and the backend disk storage.  As stated previously, the single line drawn between the two portions simply represents an abstracted connection, not the link count or topology.   The second item to notice is the relative absence of fibre channel as a disk topology outside of 3.5″ disk.  The reasons are very simple: there are no immediate designs available for next generation fibre channel disk (i.e. 8Gb/s fibre channel drives) from any manufacturers.  This is not to say there won’t be 8gb/s disk; on the contrary, there very well may be disk of that nature coming.  I just don’t see the applicability of fibre as a connectivity medium for disk lasting long term.

Let’s take a look at several of the FSS backend storage dimensions in detail.

Physical:

By “physical” I mean the actual physical layout and topology of the expansion chassis and disk.  Each of the disk enclosures can be broken out by type of disk and expansion room.  For example, due to physical size, 3.5″ drive enclosures are limited to 3U in size.  Obviously, you could use smaller 2U enclosures (reference the AX4-5 with 12 disk enclosures) and the obvious scale is greater than the 3U chassis with 15-16 drives (36 drives in 6U vs. 30).  I think this comes down to engineering preference more than anything else.

Moving from the 3.5″ drive form factor to the 2.5″ form factor, you gain even greater scale.  In a typical 2U disk enclosure, you can fit up to 24 drives (most of the designs I’ve seen out there are using this basic design).  Consider the scale:  72 drives in 2.5″ form factor in 6U vs. 30-36 in the 3.5″ form factor.  The same holds true for Solid State Disks (SSDs, EFDs, etc.) in these enclosures.  Massive scale in minimum footprint.

Expansion and Performance:

Expansion is one of those gray areas for storage vendors.  In a good design, you’d minimize bandwidth loss (FC-AL designs) and having to do a complete loop circuit from enclosure 1 to enclosure X.  It just makes sense. Practically, this works itself out in a couple of ways: use high bandwidth point-to-point switching within the enclosures themselves (FC-SW) or, figure out a way to tie enclosure X to a specific node (keeping the loops simple).  Both of the ideas have merits and, in trying to pick one type of expansion over another, you’ve got to toss the performance metric into the ring.  Practically, an internal switched design in the enclosure is the way to go from a performance standpoint.  However, at the “ends,” you’re going to be limited by your interconnect design.  Being able to take the massive RAID group bandwidth and channel it right back to the processing node requires a interconnect (like FCoE or, better yet, Infiniband) that has low latency and high bandwidth capabilities.  Companies like Isilon have already implemented that sort of technology in their arrays and, while their business model hasn’t been successful, the technology end is decidedly interesting.

Closing Thoughts:

I’ve kept this general for a reason.  Each of the disk technologies underlying the FSS require their own subset article and I’ll take the time to discuss those in the future.  Especially pertinent is the emergence of SSDs (Solid State Disks) as powerful alternatives to mechanical disk drives.  In any case, if you have any questions about what you’ve read here, please let me know.

Cheers,

Dave Graham

Reblog this post [with Zemanta]
Share
  • Rob Peglar

    Dave,

    Interesting article, just ran across it. In the article, you describe your 'future storage system'. What you have just described is the Intelligent Storage Element (ISE) – which is not future, it's present. Massive scale in minimum footprint, switched topology, both 2.5″ and 3.5″ drives. BTW, it's 40 2.5″ drives in 3U, not 30-36.

    Keep up the good work.

  • Rob,
    ISE, while definitely a “future oriented storage technology” isn’t ultimately where I’m going with this. Sure, some of the concepts apply, but, ideologically, I’m moving away from monolithic storage systems and trying to leverage the capabilities of commodity computing (best example is XIV for that) to service storage. Let me explain a bit further:

    a.) The choice of AMD, while not necessarily obvious on the surface, is really tied to their implementation of Hypertransport for both system I/O as well as processor I/O. using cHT as well as ncHT throughout a system means that my economy of scale is ONLY limited by the amount of nodes I wish to attach. Additionally, it avoids some of the limitations of messaging limits (for example) on Infiniband.
    b.) Torrenza further extends the platform by allowing hardware-dependent features to be added as needed (such as dedicated encryption, deduplication engines) directly into the system I/O bus without having to extensively remap I/O through bridge chips, etc. VERY cool technology. This also ties into the concept of using GPGPUs for these type of processes as well which, believe it or not, are a heck of a lot more powerful than FPGA asics that our bretheren @ 3Par and BlueArc use.

    As far as drive counts are concerned, that’s really not an issue. I’ll pack them in where they make sense ? (40 x 2.5” is very good scale but sealed drive packs aren’t necessarily as effective (imho) as you portend them to be.) of note, I’m not a big fan of our 3U 15 drive DAEs either…. 😉

    cheers,

    Dave

  • Rob Peglar

    Dave,

    Interesting article, just ran across it. In the article, you describe your 'future storage system'. What you have just described is the Intelligent Storage Element (ISE) – which is not future, it's present. Massive scale in minimum footprint, switched topology, both 2.5″ and 3.5″ drives. BTW, it's 40 2.5″ drives in 3U, not 30-36.

    Keep up the good work.

  • Rob,
    ISE, while definitely a “future oriented storage technology” isn’t ultimately where I’m going with this. Sure, some of the concepts apply, but, ideologically, I’m moving away from monolithic storage systems and trying to leverage the capabilities of commodity computing (best example is XIV for that) to service storage. Let me explain a bit further:

    a.) The choice of AMD, while not necessarily obvious on the surface, is really tied to their implementation of Hypertransport for both system I/O as well as processor I/O. using cHT as well as ncHT throughout a system means that my economy of scale is ONLY limited by the amount of nodes I wish to attach. Additionally, it avoids some of the limitations of messaging limits (for example) on Infiniband.
    b.) Torrenza further extends the platform by allowing hardware-dependent features to be added as needed (such as dedicated encryption, deduplication engines) directly into the system I/O bus without having to extensively remap I/O through bridge chips, etc. VERY cool technology. This also ties into the concept of using GPGPUs for these type of processes as well which, believe it or not, are a heck of a lot more powerful than FPGA asics that our bretheren @ 3Par and BlueArc use.

    As far as drive counts are concerned, that’s really not an issue. I’ll pack them in where they make sense ? (40 x 2.5” is very good scale but sealed drive packs aren’t necessarily as effective (imho) as you portend them to be.) of note, I’m not a big fan of our 3U 15 drive DAEs either…. 😉

    cheers,

    Dave