As many of you may know, I work for EMC‘s Cloud Infrastructure Group as part of the Atmos solution team. In this role, I’ve been blessed with getting a closer look at where the future of cloud storage is going as well as some of the drivers that will get it there. In this post, I’d like to talk a bit about policy and how this will shape the future of storage. I’m going to keep this as abstracted from product as possible, but where appropriate, I’ll try to show you how products are implementing this technology TODAY.
What is Policy?
By definition, policy is “[an] action or procedure conforming to or considered with reference to prudence or expediency” (dictionary.com for that definition). When viewed in the context of storage systems and management, policy, then, is the actions (scripted or otherwise) that influence data to provide for retrieval, performance, or manipulation by systems. In other words, policy is an engine that manages data from start to finish. Why this is important requires us to look at what the typical management stack looks like today.
Data is created by users accessing programs that are tied to physical and virtual resources. This generated data is then processed and stored by the programs and their underlying storage I/O layers (LVMs, hypervisor I/O stacks, etc.) onto some sort of storage device (SAN, NAS, DAS, etc.) where it sits until next access. In essence, once data is created it is considered to be “at rest” until it is next accessed (if ever). Within this data generation and storage continuum, the process is fundementally simple as generated data is put directly to storage. However, if the data continues to sit in the same place endlessly, it’s typically inefficient to retrieve and access. Managing this data was typically a manual process where data, LUNs, and their topologies had to be moved around using array or host-based tools to provide better “fit” for data at rest or data accesses for performance. This is where policy steps in.
Policy uses hooks into data (also known as metadata) in order to enact controls. Please see this post for more detailed explanation of metadata.
Why use Policies?
If the previous example shows anything, it’s that the management of data is fundementally…well, boring and manual. Policy provides a method of controlling the stack of data ingest AND data management while allowing business to continue to generate, retrieve, and manipulate data. For example, a simple policy that could be enacted against data could be as follows:
if data < 14 days old, store on EFD drives, LUN 11; if > 14 days old, store on SATA drives, LUN 33
Obviously, that’s a high-level abstraction of what the actual process for data control would look like but drives the point home. What used to be a manual LUN migration policy to “performance” or “store” data now is set based on a logical control structure that can be automagically enacted on the storage system itself. A working example of this type of policy can be seen in the tiering provided by Compellent and EMC’s FAST systems for storage management. Pretty cool, huh?
An alternative method of control that isn’t necessary tied to the storage array is the recent introduction of VMware‘s Storage DRS (Dynamic Resource Scheduling) which is enacted against the storage I/O stack of VMware’s vSphere hypervisor.
The Future of Policy
Obviously, my examples are very simplistic in nature but hopefully, they make the policy technology somewhat more accessible. As far as policy futures are concerned, this is where storage technologies (and even host process management) will be going. In the future, simple policy creation and enforcement will be a necessary part of storage pool creation and integration as well as the ongoing maintenance and support of storage arrays.
As always, feedback is welcome!
edit: 9/21/09: removed a mis-aligned reference to Atmos storage policy.