Just was getting some thoughts out about using GPGPUs (nVidia’s Tesla and Fermi products, ATI’s FireStream) for handling some of the menial cloud-based operations (metadata calc, metadata location servicing, offload, etc.) that we have as part of any type of cloud file system.
So, what are your thoughts? The capacity to further collapse x86 commodity platforms by utilizing the extreme amount of processing capability present within these GPGPUs and coupling optimized OpenCL or CUDA routines into the standard CFS processing stack.
Going along with the Cloud theme that I’m fortunate enough to be a part of, I’ve decided to use the “micro-burst” moniker to section off quick n’ dirty posts on a variety of cloud subjects that I don’t have time to dive into fully. With that in mind, let’s get on with the show.
Today’s topic is based on the link here from The Register. What I find fascinating is that Google has been able to manage their growth using a single master node topology for their filesystem. To the article’s point, a single master node offers a single point of failure especially from a chunklet processing and scheduling standpoint. Bandwidth would also be constrained seeing as how meta would have to pass through and be processed by a single entity. Since I’m unaware of the underlying hardware and scalability of their processing complex for this (though I’ve read through the articles that have attempted to explain it), these processing issues could reasonably be remedied by more powerful system hardware and/or software refinement.
It’s exciting to see that Google has thus far been able to move their GFS platform forward and embrace a horizontal scale-out mechanism for the revision 2 product. Good luck to them as they continue to move their company forward!
Why a Master Node Plurality Makes Sense
When designing for any sort of scale-out filesystem (or what I’d consider a horizontally scalable file system), it makes sense to include the ability to scale the master node (or scheduler node) complexes. The obvious reason behind this is filesystem growth, to be sure, but as metadata processing becomes increasingly complex (i.e. more FS abilities driven by custom meta), the need to ingest data at the same or higher rate as originally specified becomes critical. By having a more robust front-end driven by more powerful master nodes with synchronous metadata indexes (or siloed masters with individual meta dbs), you can maintain latency (time to disk or time to commit) SLAs without completely crushing your cloud’s ability to service I/O operations in general. (see image below for conceptual diagram)
Multiple Masternode File system
Hopefully my musings on this subject make sense. Let me know if you have any questions!
Over at Information Playground, Steve Todd has started down the path of no return: private clouds. (Incidentally, I find it quite ironic that private clouds are no more private than public clouds in that they’re essentially run on the same infrastructure and face the exact same challenges for security, data mobility, and perminence that the aforementioned public clouds do…but, I digress) In his posting from last week, he details some of the challenges in looking at replication to the cloud (whether public or private is a mere stroke of the pen difference). The good news is: he’s not alone in thinking this way. The bad news: well, we’ll get to that. Let’s begin…
Earlier this morning, Scott Lowe posed the following question: What if hypervisors shared a file system? The concept here is that most hypervisors (notably VMware and [soon] Hyper-V) have a clustered file system that is used to extend the capabilities of a group of hypervisors into such things as dynamic resource sharing, failover/failback, HA, etc. [...]
In part one of the COSS series, we discussed the nature of content within the cloud. After determining the nature of the content being stored, it is important to understand how this unstructured and structured content will be stored. The mechanism for storage has significant impact on a provider’s Service Level Agreement (SLA) to the [...]