Hybridizing DR for the Cloud: Concerns

by Dave Graham on March 2, 2009



Over at Information Playground, Steve Todd has started down the path of no return: private clouds.  (Incidentally, I find it quite ironic that private clouds are no more private than public clouds in that they’re essentially run on the same infrastructure and face the exact same challenges for security, data mobility, and perminence that the aforementioned public clouds do…but, I digress) In his posting from last week, he details some of the challenges in looking at replication to the cloud (whether public or private is a mere stroke of the pen difference).  The good news is: he’s not alone in thinking this way. The bad news: well, we’ll get to that.  Let’s begin…

Replication at the speed of the cloud…

The principles of replication (at a very basic level) exist to provide a replica or replicas of a “gold” data set and separate them from the source, either within the same system (a la Recoverpoint CDP, Clones, etc.) or to disparate systems (a la EMC Atmos, Recoverpoint CRR, Mirrorview, SRDF, Replicator, etc.).  By having a replica of the original data removed from the local source, disaster recovery becomes a tenable option because that data can just be “replayed” back to the host application and will appear the same.

Looking at this (admittedly) basic definition of replication, there are a couple of assumptions (especially when looking at remote replication) that need to be made:

  • The use of public/private communication lines that provide bandwidth and connectivity from primary to secondary sites
  • Some mechanism to accept and process sent data at the target location (i.e VSA, Recoverpoint appliance, Avamar node, etc.)
  • Some sort of binding SLA between the customer and the circuit provider regarding downtime, data loss, and noise on line with mechanisms for recovery of communication circuit in the event of a failure along with service credit, etc.

As noted, this scenario plays out almost daily.  As part of any good DR/BC plan, there are mechanisms in place to test failover/failback, line tests to determine circuit viability as well as data/packet loss, etc.  What works on terrestrial DR/BC plans may not translate all that well to the cloud, however.

Communications

The cloud in general is based on public infrastructure that in turn is based on a multiplicity of backbones provided by multiple carriers.  If you want to see how this looks in reality, check out this visual traceroute from Point A to Point B.

Traceroute

Traceroute

As you can see, things work well up to a certain point.  If I were sending data across this line, I’d have a problem as the data stream here timed out somewhere BEFORE its ultimate destination.  If the cloud was the target, your data would have never made it.

Target Data Processing

As Steve mentioned, the ability to simply upload, initiate, configure, and start a VSA (virtual storage appliance) in the cloud is made significantly easier by cloud providers who have inherent processing and storage matrices tied closely together.  Being able to place a VSA, packaged in OVF, directly into a cloud “instance” and configure it to start accepting data and writing it to cloud storage (preferably across a cFS-supporting backend storage system) hits directly at the heart of what the cloud should provide: powerfully simple compute and storage.  Where this particular scenario does breakdown is in the data portability segment for the cloud storage backend.

Conceptually, the VSA should be writing the known data to a file system of its creation (obviously isolated and containerized) such that it understands the data layout and retrieval processes required to move data back to the source in the event of a failure.  However, to an even greater level, what is happening to the filesystem as it rests within the global cloud file system?  What methods of protection and prediction (even) are available there?  What happens if your primary cloud storage connection is lost and your cloud provider loses the data as well?  This type of scenario, while extreme, has its merits and makes the case for a VSA on the client side that can split writes to multiple private clouds or cloud providers for storage and processing.

Service Level Agreements

The dreaded SLA proof-point continues to chase after the cloud computing types that seem to downplay its very real role in ongoing DR/BC planning and strategy.  To use the Amazon S3 SLA as a launching point, there is simply no recourse for data recovery or accessibility in case of a service outtage.  So, if your data is lost “in the void”  at S3, you’ve got no ability to recover it nor “insurance” to provide any remuneration for lost data.  Additionally, if you’ve purchased 3rd party insurance for your data (assuming that such a thing exists), your liability limits would probably be undone as there is no current model for cloud data protection.

The difference between carrier SLAs and the cloud SLAs are striking.  If you looked at an SLA with Verizon Data Services, for example, you would see clearly delineated assurances or guarantees based on the type of circuit, etc. as well as exclusions for natural disasters, etc.  This particular document would take up at least 15-20 pages on your desk and have legal standing if ever taken to court.  In a marked departure from this, a cloud SLA is made up of several “throats to choke”: cloud provider, circuit provider(s).  Only a private cloud, with discrete control mechanisms, can promise the type of SLA needed in order to make a viable DR/BC plan.

Conclusion

I’ve definitely been rambling on for a bit here, but it’s important to understand what risk(s) you have to encounter if you’re attempting to place your business in the arms of the cloud.  Obviously, a private cloud topology as part of a hybridized model including a non-cloud DR locale, can provide both the recovery and scale mechanisms needed to move your business forward but, without addressing key components of this particular infrastructure, disasters can become, well, even greater disasters.  If you want to do the cloud right, you need to make sure that your head isn’t completely consumed by it.

Reblog this post [with Zemanta]
Share