CA Community






April 2011 - Posts

Big Data: A Killer Application For The Cloud

Published: April 01 2011, 09:56 AM | 3 Comment(s)
by Don Ferguson

I recently participated in a “fireside chat” on Big Data at Structure Big Data 2011. Gillian Munson, managing director at Allen & Co., moderated the chat. I found that Gillian’s insights and questions stimulated new ideas and insights. I want to share three major observations.

Some examples of Big Data from Wikipedia are “… meteorology, genomics, biological research, Internet search, finance and business informatics. Data sets also grow in size because they are increasingly being gathered by ubiquitous information-sensing mobile devices, software logs, cameras, microphones, RFID readers, wireless sensor networks ...” There is an annual conference devoted to research in the field – International Conference on Very Large Data Bases. 


Figure 1: MapReduce

MapReduce and Hadoop are fundamental technologies for processing big data. Other examples are HBase, Cassandra and Hypertable. These technologies demonstrate why Big Data is a killer application for cloud computing.  Concepts to keep in mind are:

1. Elastic Computing: The processing of Big Data is highly-parallelizable and can therefore exploit large numbers of commodity machines.  Also, the resource requirements for this processing vary greatly between computations runs, and processing is not continuous. So there are big surges and quiet intervals that lend themselves to the use of “elastic” resources.
2. Standards:  De facto standardization on Hadoop/MapReduce/etc. means that the processing of Big Data lends itself to   platform-as-a-service (PaaS) offerings.  PaaS solutions such as Amazon Elastic MapReduce make it easier to s exploit cloud computing to process Big Data. In addition, PaaS is much simpler than manually configuring virtual machines and software tools in an infrastructure-as-a-service (IaaS) environment.
3. Public Clouds: The cost and complexity of the kind of massive private cloud necessary for processing Big Data are prohibitive for most enterprises.  This makes public clouds much more attractive for this purpose.  Also, enterprise customers should not have major security concerns when it comes to processing Big Data in a public cloud, because the massive scale of the data provides “security through obscurity.”  There are also techniques for obfuscating or subsetting the data but still producing meaningful results.

Increased Emphasis on IT Management Software  

Figure 2: Cloud Spaning Application

To work with Big Data, enterprises have historically had to employ a diverse set of tools--including development tools, application platforms, database servers, IT management software, and security software MapReduce and Hadoop replace most of these tools with simple, open source applications. The only thing they don’t replace are security and IT management. Also, as the processing of Big Data gets driven to PaaS and the programmable web-- e.g. www.programmableweb.com and Google’s PubSubHubbub protocols--- enterprises will focus more on: 

1) finding cloud services,
2) configuring service policies and providing data, and
3) managing resources across the cloud.

This further diminishes the role of application development tools and platforms.  This greater emphasis on management and security increases the value to customers of CA Technologies.

What Big Data Will CA Technologies Process?

CA Technologies is itself an enterprise that, like other large enterprises, has its own Big Data that it has to process. CA Technologies IT department  and business systems will exploit Big Data like any other enterprise. The more interesting question is,  “What types of Big Data will CA Technologies products process?”

CA Labs has done research that offers insight into possible product functions that will entail the processing of Big Data. Examples include:

1. Analyzing email, instant messaging logs, wikis and web sites, etc. to extract information about which people know what about IT. This enables the creation of a social network of people that can help each other solve various types of IT problems. Combined with CA Open Space and our service management products, these social networks will make both IT more productive and improve services to end users.

2. IT systems generate a huge number of alerts and log messages. Improved analysis of these logs and alerts can enable better discovery of root-cause problems, availability patterns, performance trends, etc. This analysis will become extremely valuable as we move to IT management via SaaS since 1). the data will already be in the SaaS environment, and 2.) analysis can be performed across large numbers of customers in a multi-tenant environment.

3. Security logs provide a rich source of information for identifying potential insider threats and better assigning security roles to staff.

All of these points underscore the fact that Big Data is an important driver for the rapidly evolving cloud market—and that CA Technologies is very well-positioned to deliver value to customers who make the move to processing Big Data in the cloud.


 

Share this post:  

 

By: Don Ferguson
Dr. Donald F. Ferguson is executive vice president and chief technology officer at CA, responsible for delivering common technology services to CA’s business units, ensuring architectural compliance and integration of the company's solutions and products. Tasked with promoting technical excellence...
Read More..

More Posts