June 11, 2012 - 10:35pm
Last year I wrote a blog post that showed how the lack of parallelism in computing was a limiter to innovation. My post was inspired by a technology article I read in the Economist magazine – http://www.panasas.com/blog/parallelism-goes-mainstream. The Economist continues to be a trend setter, taking on new topics that highlight a revolution in the IT industry. The May 19th issue takes on a new hot topic with a special report on big data in finance and banking – http://www.economist.com/node/21554743 The article highlights how big data (the use of unstructured data to create business value) is radically changing the face of banking. An innocuous graph in the corner highlights that the amount of data being created in the world (35 zettabytes by 2020) is more than double the amount of storage space that will be available by then. The growth rates are staggering with over 45% compounded annual growth in storage capacity, and no slowdown in sight. The article notes that data growth in the financial sector is accelerating as banking and customers move from bricks and mortar to computers and mobile, and from cash to cards and electronic transfers. But along with these trends comes the opportunity for fraud. It is much easier to electronically rob a bank than the Bonnie and Clyde way. And this is where big data comes into the picture.
Big data gives banks the ability to wade through millions of disparate datasets including emails, web hits, online transactions, location tracking, and much more, looking for patterns of fraud among millions of legitimate transactions. But it doesn’t stop there, as the article notes, banks are “panning for gold,” processing large amounts of data to target consumers with products and services. With cards replacing cash as a payment method, banks have the luxury of being the most clued in entity to a customer’s buying habit because they can see all electronic transactions made by any particular customer, giving them a pretty good idea of how much remaining money they have to spend too. Banks will accumulate Petabytes of data at astounding rates to locate these nuggets.
So what has all this got to do with storage? Big data is going to require big storage to service the growth in the financial sector. To provide some context, 10 terabytes (TB) of storage will grow to 64TB in five years and 410TB in 10 years, based on the Economist growth predictions. Most of that data growth will be unstructured and varied in type. This poses a challenge on two fronts for legacy storage systems that were designed for highly structured environments. Traditional scale-up NAS systems created silos of data, customer credit card transactions reside on a different system than ATM withdrawals, online access is a different system again, as are mortgages and investments. To extract value from the disparate datasets, they have to be combined in complex models that can query all different data types and scenarios.
Big data demands that many different data sorts are combined to provide valuable intelligence, whether it is fraud protection or increasing sales. Given the unpredictability of the data growth (who knows what data will be valuable next year or the year after?) systems must be capable of growing in line with the rate of data acquisition. A modular, scalable architecture will be mandatory to manage this environment. In addition, performance has to be maintained as the system size grows and the datasets continue to get bigger. Storage systems will need to scale to thousands of storage nodes not tens of nodes, which is typically the limit of legacy systems. Compare that to the Panasas ActiveStor family of high performance, scale-out storage appliances which has been deployed into environments that support over a 1000 nodes and petabytes of storage in a single file system – and the Panasas architecture has no upper limit. All this capacity is made available with no performance hit because the system was built to scale to meet the demands of high performance computing doing complex simulations. This is thanks to the Panasas distributed file system that can handle thousands of I/O requests in parallel and without the bottlenecks of traditional filer heads. Panasas ActiveStor will grow as a bank’s needs grow and it ensures that IT folks can focus on extracting value from their data, and not on worrying about what to do when the system runs out of space or stalls because it can’t keep up with the volume of data that’s being analyzed.