What is Cloud Storage - Reference and Papers
The ParaScale library is an ongoing effort where we will continue to expand our content offerings to provide the market an educational reference site on cloud storage. Please check back often or sign up for our mailing list and we will notify you about new materials made available. In the interim, please feel free to review the information and links below.
Defining Cloud Storage - Three Key Characteristics
"I’d like to start with a definition of cloud storage. This is a tricky concept as there are many vendors (ParaScale included) trying to define cloud storage to their advantage. The end result is a great deal of confusion and frustration to the end-user. We define cloud storage by three characteristics:
- A storage service delivered over a network (internet or intranet)
- Easy to scale
- Easy to manage
All three criteria must be satisfied to fit the definition yet there is a range of solutions that satisfy these requirements."
Read the rest of the article on IT ProPortal
What is Cloud Storage?
Cloud storage is an emerging technology that leverages commodity hardware tied together by software to appear as a single storage device. Nodes can scale to the 100’s so policy based management, self healing, dynamic expansion, capacity management and simple interconnect are requirements or the cloud becomes unruly.
It started with cloud computing and SaaS
Cloud storage is a natural extension of SaaS (Software-as-a-Service; applications like SalesForce delivered over the web as a service) and cloud computing (CPU cycles available for rent over the web). Made popular by Google, Amazon and VMware, a cloud computing architecture is defined as:
“The architecture behind cloud computing is a massive network of "cloud servers" interconnected as if in a grid running in parallel, sometimes using the technique of virtualization to maximize the utilization of the computing power available per server.” - Wikipedia
The concept is extremely powerful as commodity hardware is stitched together to provide a massive pool of compute power. Prior architectures centralized processing into massive mainframe systems which can cost millions of dollars to acquire and maintain. Today cloud computing is typically leveraged for in-house processing (Google) or sold as a service (Amazon EC2) in a subscription model as a small, medium or large server (“instance” in Amazon terminology).
The emergence of cloud storage
The cloud concept migrated to storage in the form of a service offering from Amazon (S3). Behind the scenes Amazon is managing a cloud of storage, multiple commodity hardware devices are tied together by software to create a pool of storage. Emerging web companies have embraced this offering, creating industry buzz around the terms and concepts of cloud storage. It’s an architecture, not a service
Cloud storage is not just about online storage rentals. Whether you own or rent is secondary. Primarily, Cloud Storage is about scaling capacity and performance by adding standard hardware, and having shared access via a standard network. With Moore’s Law driving ever decreasing commodity disk and CPU prices, cloud storage stands to be a highly disruptive technology inside the datacenter.
Don’t throw out that primary NAS or SAN
Cloud storage is not the panacea of storage and is unlikely to replace the SAN in the near future. Instead it is a new class of storage offerings that should be leveraged where appropriate. Digital content distribution and serving, file archiving, video surveillance, streaming media or large file systems are all good examples of where cloud storage solutions are a good fit. These all have several aspects in common; huge data sets and file systems, parallel file serving requirements, longevity of file access and the need for low cost deployments.
If you are maintaining a transactional database that drives your enterprise revenue stream, cloud storage is not a solution for you, and you should be looking at SAN solutions from vendors like EMC, NetApp, IBM, HP, and HDS among others. If your requirement is for tier1 NAS storage (for email, databases, engineering development) with complex data protection and compliance requirements, you should be considering NAS leaders like NetApp and EMC.
This sounds like clustered storage
Cluster file systems (Bluearc, Pananas, NetApp Spinnaker/ONTAP8, iBrix, HP/PolyServe, etc.) are tightly coupled and were built to solve the challenges of high performance computing. Every node in the cluster must share its information with every other node resulting in massive amounts of intra-cluster communication. The result is great performance to a single file that is striped across a few nodes. However there is a downside, once a cluster scales past 10-20 nodes, the communication requirements overload the systems and performance suffers. So most large deployments using technology of this involves multiple clusters each with 10-20 nodes. Additionally, engineering this kind of massive performance involves a lot of tweaks to the OS and to the applications, making for a very difficult install and manage situation. Managing these deployments could be fun for the PhDs in the national labs, but harried IT administrators should pay attention to the operational aspects
Cloud storage is loosely coupled, where the nodes don’t need to talk to each other to facilitate supercomputers writing in parallel to the same file spread across multiple nodes. Being loosely-coupled allows huge scalability and parallel performance for multiple files (or multiple copies of a single file) across multiple nodes. Examples include ParaScale, proprietary solutions like the Google file system and Amazon S3, and open source technologies like Hadoop.
Other Resources to Consider
If you are looking for the technology differences between cloud storage and other storage offerings, visit the Cloud Storage Technology section of the library.
For white papers and audio / video guides to ParaScale, visit the ParaScale Cloud Storage section of the library.
If you are ready to download a trial of ParaScale, visit the Evaluating ParaScale section of the library.
For details on ParaScale features, visit the products pages.
For information on ParaScale solutions, visit the solutions pages.
|